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SELF-ASSEMBLING GENES, VECTORS AND USES THEREOF 

Field of the Invention 

This invention relates to the construction and usage of synthetic genes for 
genetic engineering and gene therapy. 

Background of the invention 

This application claims the benefit of a provisional application U.S. Serial No, 
60/070,910, filed on February 28, 1997, entitled "Self-Assembling Genes." 

Recombination at the genetic level is important for generating diversity and 
adaptive change widiin genomes of virtually ail organisms. Recombinant DNA technology is 
based upon simple 'cut-and-paste' methods for manipulating nucleic acid molecules in vitro. 
The pieces of genetic material, or DNA are first digested with a restriction endonuclease 
en:grme which recognizes specific sequences within the DNA. After preparation of two or 
more pieces of DNA, the ends of the DNA are fimher manipulated, if necessary, to make 
them compatible for ligation or joining together. DNA ligase, together with adenosine 
triphosphate (ATP) is added to the genes, ligating them back together. The genetic assembly 
containing an origin of DNA replication and a selectable gene is then inserted into a living 
cell, is grown up, and is positively selected to yield a pure culture capable of providing high 
yields of individual recombinant DNA molecules, or their prodjicts such as RNA or protein. 

Significant improvements have been made to this tecKaoiogy over the last two 
and a half decades. Numerous enzymes, end-linkers and adapter molecules have been made 
commercially available, which facilitate in the construction of recombinant DNA molecules. 
By using two restriction enzymes with different single-stranded termini or blunt ends, it is 
possible to directionally assemble genes (forced cloning). This reduces the amount of 
screening required to determine orientation. Procedures have been automated for synthesis of 
single-stranded gene firagments up to 200 or more nucleotides in length by means of 
phosphoramidite chemistry, and the instrumentation is readily available through Applied 
Biosystems, Inc., Foster City, CA. Such single-stranded firagments can be joined by 
annealing overlapping complimentary phosphorylated strands, and by enzymatically filling in 
the ends with DNA polymerase and DNA precursors. In this way, multiple, overlapping, 
single-stranded firagments can be assembled into a larger, double-stranded superstructure. 
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Whole genes have been synthesized by similar methods. However, it becomes increasingly 
difScult to use synthetic DNA strands when making genes larger than approximately one 
kilobase. Using gene amplification methods (e.g. polymerase chain reaction (PCR), Mullis et 
aL, U.S. Patent 4,683 »1 95), together with synthetic oligonucleotides, it is possible to make 

5 biologically active, synthetic retro- vectors that are capable of RNA transcription, reverse- 
transcription, viral packaging, and integration into genomic DNA (see for example, Hodgson, 
WO94/20608). Hodgson, supra, also disclosed methods for cloning of transcriptional 
promoters into such a vector using traditional recombinant DNA technology. 

Modified restriction enzyme sites, linkers, and adapters can change the 

10 primary or secondary structure of complex nucleic acid sequences thereby altering or 
obliterating a desired biological activity. For example, small mutations can drastically 
modify transcriptional promoters or change the reading frame of coding DNA. A logical goal 
of vectorology is to make exact constructs, Avithout need of fortuitous restriction sites, 
adapters, or linkers. 

1 5 Restriction endonucleases can be grouped based on similar characteristics In 

general there are three major types or classes: I, n (including IIS) and III. Class I enzymes 
cuts at a somewhat random site from the enzyme recognition sites (see Old and Primrose, 
1994. Principles of Gene Manipulation, Blackwell Sciences, Inc., Cambridge, MA, p.24). 
Most enzymes used in molecular biology are type II enzymes. These enzymes recognize a 

20 particular target sequence (i.e., restriction endonuclease recognition site) and break the 
polynucleotide chains within or near to the recognition site. The type II recognition 
sequences are continuous or interrupted. Class IIS enzymes (i.e., type IIS enzymes) have 
asymmetric recognition sequences. Cleavage occurs at a distance from the recognition site. 
These enzymes have been reviewed by Szybalski et al. Gene 100:13-26, 1991. Class 

25 in restriction enzymes are rare and are not commonly used in molecular biology. 

U.S. Patent No. 4,293,652 employed a linker with a class IIS enzyme 
recognition sequence to permit synthesized DNA to be inserted into a vector without 
disturbing a recognition sequence. Brousseau et al. {Gene 17:279-289, 1982) and Urdea et al. 
{Proc, Natl Acad ScL USA 80:7461-7465, 1983) disclose the use of class IIS enzymes for 

30 the production of vectors to produce recombinant insulin and epidennal growth factor 
lespectively. Mandecki et al. described a method for making synthetic genes by cloning 
small oligonucleotides using a vector (Gene 68: 101-107, 1988). Expansion of a population of 
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oligonucleotides required synthesis, cloning excision and fragment purification. The 
oligonucleotides were used to create a complete plasmid. 

Lebedenko et al. (NucL Acids Res. 19(24):6757-6771) illustrated the class IIS 
enzymes and PCR for precisely joining 3 nucleic acid molecules for. convention sub-cloning 
5 using BamHI. Tomic et al. (Nucleic Acids Res., 18:1656, 1990), reported a method for site- 
directed mutagenesis using the polymerase chain reaction and class IIS enzymes to join two 
nucleic acid molecules. Two overlapping PCR primers were used where the primers included 
class IIS recognition sites. The primers included a region of complementarity to the template 
DNA and include one to a few site-directed mutations. Stemmer et al. (U.S. Patent No. 

10 5,5 14,568) employed overlapping primers with class IIS enzymes to amplify a plasmid and to 
introduce specific mutations into DNA leaving all other positions unaltered. 

There remains a need for the ordering and assembly of complex genes to 
overcome the problems associated with sequential sub-cloning such as muhiple purification 
steps, the potential for sample loss, and the like. Moreover there is a need for eliminating the 

15 use of prokaryotic hosts and for minimiang or avoiding the risks associated with bacterial 
contamination resulting firom the use of bacteria as intermediaries in the cloning process. 
Further, there remains a need for efiBcient methods to assemble large nucleic acid molecules 
or many-firagmented nucleic acid assemblies with precision. 

20 Brief Description of the Figures 

Fig. 1 A. provides one schematic of six double stranded DNA fragments, each 
terminus comprising a unique overhanging two-nucleotide sequence complementary to only 
one other terminus 

Fig. IB. illustrates a three-piece ligation where 100% of the clones tested contained 
25 the predicted fragment order and desired fragment orientation. 

Fig. 2. illustrates the use of a class IIS restriction endonuclease (as one example, 
Bpml)y restriction endonuclease recognition site and the selection of cohesive overhanging 
ends. 

Fig. 3A. illustrates an exemplary retrotransposon-derived vector including a murine 
30 VL30 LTR (NLV-3) and packaging signal, an internal ribosome entry site (IRES) from 
encephalomyocarditis virus (EMCV), a gene encoding a green fluorescent protein (GFP), 
additional mtemal VL30 sequences (solid bar), SV40 eariy region promoter and Tn5 
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aminoglycosidase phosphotransferase (neo) gene, PBR322 plasmid origin of replication and a 
plus-strand primer binding site (VL30). An exemplary vector sequence is provided as 
VLBPGN (SEQ ID NO: 1). Fig SB is an illustration of an LTR with the insertion of a U3 
(transcriptional promoter)region rescued by reverse transcriptase-polymerase chain reaction 

5 (RT-PCR). The promoter is amplified from the RNA of a cell expressmg the VL30 U3 
region. Complementary overhanging ends are created using class IIS restriction 
endonuclease digestion sites within the LTR and within the promoter. Fig. 3C provides the 
linear structure of a VL30 RNA transcript from a mouse cell with a U3 region near the 3'- 
terminus of the RNA molecule. PGR primers include a class IIS enzyme recognition site to 

1 0 amplify the U3 region from the RNA resulting in a double stranded DN A molecule. Cleavage 
with a class IIS enzyme (here Bprnl), results in a double-stranded DNA molecule with end 
complementary to a site in the vector of Fig. 3 A. 

Fig. 4A. is a schematic illustrating steps for assembling a combinatorial library of cis- 
or /ran^-acting nucleic acid sequences for assembly and screening, useful for the rescue of 

1 S biologically active species. Fig. 4b is a diagram of a U3 (transcriptional enhancer and 
promoter region of an LTR illustrating several sub-divisions of the transcriptional control 
region, including a distal enhancer region, an enhancer repeat region, a medial promoter and a 
proximal promoter. These regions have been described for other vectors in Hodgson et al. 
(1996. "Coiistmction, Transmission and Expression of Synthetic VL30 Vectors" in Hodgson 

20 ed. Retro-vectors for Human Gene Therapy. RG Landes Company, Austin TX). Segments 
of these regions are amplified using primers for highly conserved sequences. Highly 
conserved sequences are determine based on a comparison of known VL30 sequences such as 
provided in Fig. 4.2 of Hodgson, 1996, infra). The parts are joined by annealing and ligation 
to provide an ordered assembly. Each construct is an allele or a representative of allelic 

25 variation in the combinatorial library. 

Fig, 5 discloses two transcriptional promoters that have been rescued from mouse 
VL30 RNA sequences isolated from a mouse T-helper cell library. These promoters were 
assembled into a vector andintroduced into retroviral helper cells and packaged into 
recombinant retrovirus for introduction into human T-cells. After transduction to human T 

30 cells, a P-galactosidase reporter gene was expressed from the T cell-derived promoters. 

Fig. 6 discloses 10 biologically active mouse VL30 promoters obtained from mouse 
liver RNA. Thesepromoters were introduced into the vector of SEQ ID NO: 1. The vectors 
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were introduced into retroviral helper cells and then packaged into retrovirus where they vs^re 
introduced into human liver cells. The cells expressed the green fluorescent protein. 

Fig. 7 illustrates a similarity plot of nucleotide sequences found in VL30 U3 regions. 

Fig. 8 illustrates a retro-vector comprising six double-stranded DNA fragments that 
5 were self-assembled into a circular structure using unique overlapping termini created using 
class IIS restriction endonucleases. Three templates and twelve primers were used in 
conjunction with three class IIS enzymes to make the six fragments that were ligated in a 
single step. The vector was efficiently self-assmebled and was effectively transmitted by 
both DNA transfection as well as by retroviral transduction of the self-assembled DNA, 
1 0 without molecular cloning through a prokaryotic host (see Example 2). 

BRIEF SUMMARY OF THE INVENTION 
The invention described herein provides seamless, directional, ordered 
construction of complex DNA molecules, vectors and libraries. More particularly, it enables 

IS gene constructs to be assembled with greater efficiency and precision, and it enables multiple 
gene fir^ments to be assembled in the correct order and orientation without disturbing the 
internal structure of the gene. The method utilizes in vitro assembly of nucleic acid 
firagments and relies upon the unusual ability of certain enzymes to digest nucleic acid 
molecules at pre-determined sites without disrupting the structure of the gene. It is especially 

20 usefiil for the construction of genetic vectors for gene therapy or genetic engineering of cells 
and organisms. A particular application of the invention is in combinatorial, or evolutionary 
genetics, where it enables a large number of non-random, self-assembled constructs to be 
screened simultaneously for function. 

In a preferred embodiment of this invention, the invention relates to a method 

25 method for assembling a gene or gene vector comprising the steps of: a) designing at least 6 
primers to produce to amplify at least three fragments in at least three separate polymerase 
chain reactions wherein each primer comprises at least one predetermined restriction 
endonuclease recognition site that recognizes a restriction endonuclease that cleaves at a 
distance from the recognition site, a sequence complementary to a template nucleic acid for 

30 amplification, and bases positioned at the restriction endonuclease cleavage site that are 

selected to be complementary to only one other overhanging created from enzymatic cleavage 
of the fragments; b) combining the primers with template nucleic acid and performing the 
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polymerase chain reaction to produce multiple copies of an amplified template fragment - 
incorporating the restriction endonuclease recognition site; c) digesting the amplified 
template fragments with one or more restriction endonucleases that recognize the restriction 
endonuclease recognition site of the primers to create overhanging termini wherein each 

5 overhanging termini is complementary to only one other overhanging termini on another 
fragment; and d) combining the amplified and digested template fragments in a ligation 
reaction to produce a directionally ordered gene, nucleic acid fragment or gene vector. 

In a preferred aspect of this embodiment, the restriction endonuclease is at 
least one class IIS restriction endonuclease and preferably, the class IIS restriction 

10 endonuclease is selected from the group consisting of: Alwl, Alw26l, Bbsh Bbvl, Bbvll, Bpml 
BsmAl Bsml BsmBl BspMl, BsrI, BsrDl EcoSll Earl, Fokl, Gsul, Hgal, Hphl, Mbolh 
Mnll, Plel Sapl SfdUl, Taqll, Tthl 1 HI. Still more preferably, class II restriction 
endonuclease recognition sites (to be distinguished from class IIS restriction endonuclease 
recognition sites), linkers, or adapters are not used to create tiie gene or gene vector. In one 

15 embodiment, the product of the ligation reaction is introduced into prokaryotic or eukaryotic 
cells. Preferably, at least one template nucleic acid sequence is chosen from the group 
consisting of : transcriptional regulatoiy sequences; genetic vectors; introns and/or exons; 
viral encapsidation sequences; integration signals intended for introducing nucleic acid 
molecules into other nucleic acid molecules; retrotransposon(s); VL30 elements; or multiple 

20 allelic forms of a sequence. 

In another preferred aspect of this embodiment, the method is used to generate 
combinatorial libraries of a target sequence. Preferably, the target sequence is part or all of a 
gene. In one embodiment, the gene encodes a protein. In one embodiment, the primers 
amplify allelic variants of part or all of a gene. 

25 In still another prefenred aspect of this embodiment, the product of the ligation 

reaction is passed between eukaryotic cells using a virus particle, by cell fusion, or by 
transfection. Preferably the product of the ligation reaction is not introduced into prokaryotic 
cells. Moreover, the method fiirther comprises combinmg at least one screening or selection 
step to select the products of the ligation reaction. In one embodiment, the product of the 

30 ligation reaction is mutated during passage in cells in order to generate genetic diversity and 
preferably the product of the ligation reaction is mutated by homologous recombination 
diuing passage in cells. 
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In another aspect of this embodiment, the method is used to isolate and 
identify regulatory sequences from a cell. In another aspect of this embodiment, cells 
containing the product of the ligation reaction are selected for enhanced biological activity. 
Preferably, the cells containing the product of the ligation reaction are selected for tissue- 
specific, hormone-specific or developmental-specific gene expression. Also preferably, the 
ligation reaction is a cmsularized gene vector. 

In another embodiment of this invention, the invention relates to a nucleic acid 
primer having a 5' and a 3' end to amplify a nucleic acid fiagment for the ligation of at least 
two firagments comprising: a restriction endonuclease recognition site that recognizes a 
restriction endonuclease, wherein the restriction endonuclease cleaves at a distance from the 
recognition site and creates overhanging termini; a sequence complementary to a template 
sequence to be amplified to produce the nucleic acid fragment; at least two nucleic acid bases 
positioned at the restriction endonuclease cleavage site and that form an overhanging 
terminus after cleavage by the restriction endonuclease, wherein the at least two nucleic acid 
bases are selected to be complementary to only one other overhanging terminus on another 
fragment of the ligation; and an affinity handle on the 5' end of the primer. Preferably the 
primer further comprises an anchor to provide stability to the restriction enzyme at the 
restriction enzyme recognition site. 

In yet another embodiment of this invention, the mvention relates to a method 
for isolating and identifying promoters comprising the steps of: a) obtaining a vector 
comprising at least a portion of a promoter region from a retrovirus transposon LTR and 
having two non-complementary overhangmg termini; b) designing at least two PGR primers 
to amplify at least one region of a retrovirus transposon LTR from template nucleic acid to 
produce at least one nucleic acid firagment wherein each primer comprises at least one 
predetermined restriction endonuclease recognition site that recognizes a restriction 
endonuclease that cleaves at a distance from the recognition site, a sequence complementary 
to a template sequmce fi:om a retrovirus transposon, and bases positioned at the restriction 
endonuclease cleavage site that are selected to be complementary to only one other 
overiianging terminus of the vector wherein the restriction endonuclease cleavage site is 
created from enzymatic cleavage of the fragments; b) combining the primers with template 
nucleic acid and perfi)rming a polymerase chain reaction to produce multiple copies of an 
amplified template fragment incorporating the restriction endonuclease recognition site; c) 
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digesting the amplified template fragments with one or more restriction endonuclease that - 
recognize the restriction endonuclease recognition site of the primer to create overhanging 
termini; and combining the amplified and digested template fragment in a ligation reaction 
with the vector to produce a gene vector with an intact LTR sequence. In one embodiment of 
5 this aspect of the invention, the template nucleic acid is DNA or RNA. In another 
embodiment of this aspect of the invention, the method frirther comprises the step of 
sequencing the insert to identify the promoter sequence. In one embodiment promoter 
sequences of SEQ ID N0S:1-13 identified using the methods of claim. 

1 0 Detailed Description of the Invention 

In one embodiment of this invention, the invention relates to the seamless, 
oriented self-assembly of at least three DNA fragments having overiapping unique cohesive 
ends generated by the enzymatic cleavage of at least one restriction endonuclease that is 
capable of cleaving at a site distant to the restriction enzyme recognition site. Preferably the 

IS restriction endonucleases employed in this invention are class IIS restriction endonucleases. 
These enzymes recognize a predetermined group of nucleotides and cleave at a distance 
characteristic of the particular endonuclease from the recognition site. The term ^'unique 
cohesive ends" is used herein to refer to the notion that the cleavage site for the 
endonucleases of this invention can be manipulated to produce overhanging ends with unique 

20 termini selected by the investigator. The term "complementary'* as used herein in reference 
to the overhanging ends of the firagments of this invention refers to standard complementarity 
recognized in the field of molecular biology. For example, the nucleotides sequence 5'-TAG- 
3' is said to be complementary to the nucleotide sequence 5'-CTA-3*. The term "PCR'* is 
used generally to refer to the polymerase chain reaction and its variations, including RT-PCR 

25 as well as other gene amplification techniques employing primers. 

In a first step for i^acticing one embodiment of this invention, a series of at 
least three overlapping firagments are created through the selection and creation of primers 
incorporating at least one class IIS restriction enzyme recognition sequence. The 
oligonucleotide primers of this invention are designed to amplify one or more nucleic acid 

30 fragments and comprise a sequence complementary to a target sequence for gene 

amplification, a recognition sequence for a restriction endonuclease that cleaves DNA at a 
distance from the recognition sequence (such as a class IIS restriction enzyme) and bases 
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positioned at the restriction endonuclease cleavage site that are preferably unique and 
complementary to only one other overhanging termini in the armealing/ligation reaction that 
generates the complex nucleic acid molecules. Optionally, the primers of this invention can 
include an "afGnity handle for cleanup" at the S'end. These sequences can be of any length, 

5 preferably at least about 6 bp and the sequences extend the primer in the 5' direction from the 
restriction erugrme recognition site. This extra length gives many enzymes greater stability 
and improved activity. In addition, the sequence can be used for recognition and removal of 
the ends of the primers (either undigested fragments or digested ends of primers) using 
complementary nucleotide sequences bound to a solid support (such as cellulose, 

10 nitrocellulose or silica). Incubation with, or passage over a colunrn or support containing the 
complementary sequences can be used to remove the tags by allowing them to aimeal or 
hybridize. The nucleic acid can then be eluted from the column. Adapters can also be used in 
this invention. For piurposes of this invention, adapters refer to double stranded fragments 
confining an enzyme recognition site, according to this invention. The adapters are ligated 

15 to double stranded DNA molecules, creating a fragment analogous to a PGR fragment with 
similar sites derived from a primer. The primers or adapters can be prepared using a number 
of methods for synthesizing oligonucleotides known in the art. For example instruments for 
producing oligonucleotides are available from Applied Biosystems, Inc., Foster City, CA. 

In one example, for the design of an oligonucleotide primer for use in this 

20 invention, the particular complementary bases that will form the site for hybridization of the 
primer to template (i.e., target DNA or RNA) are selected. A restriction endonuclease 
recognition site is selected followed by a nimiber of nucleotides to be positioned between the 
recognition site and the cleavage site. The nucleotides of the cleavage site are selected to 
include overhanging regions formed from the restriction endonuclease cleavage that are 

25 complementary to the overhanging regions of an adjacent fragment in the annealing/ligation 
reaction. 

The length of the primer used in this invention can vary, but preferably the 
primer length is up to about 80 bases and preferably up to about 50 bases. In addition the 
primers are preferably at least about 15 bases in length and preferably at least about 25 bases 
30 in length. The 5' region of the primer contains preferably at least about 6, preferably at least 
about 1 0 and still more preferably at least about 16-18 bases that are not complementary to 
the template DNA or RNA. Further, the primer incorporates a restriction endonuclease 
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recognition site preferably S* to the region of complementarity and a restriction endonuclease 
digestion site preferably 5' to the region of complementarity or within the region of 
complementarity. There are a variety of restriction endonucleases that cleave at a distance 
from the restriction endonuclease recognition site of a DNA strand and a variety of enzymes 
S that are commercially available from New England Biolabs are provided in Table 1 . 

Table 1. Restriction endonucleases useful in the construction of self-assembling 
genes 



Enzyme: Site size (bp): Distance to Size of overlap: Overlap type: 







overlap: 






/\Av26l 


5 


1-5bp 


4bp 


5'-0verhang 


Bbs\ 


6 


2-6bp 


4bp 


5'-overhang 


Bpm\ 


6 


16-14bp 


2bp 


3'-overhang 


SsmBI 


6 


1-5bp 


4bp 


5'-overhang 


SspMI 


6 


4-8bp 


4bp 


5'-overhang 


BsiD\ 


6 


0-2bp 


2bp 


3'-overhang 


£co57l 


6 


16-14bp 


2bp 


3'-overhang 


Fok\ 


5 


9-1 3bp 


4bp 


5'-overhang 


Hga\ 


5 


5-1 Obp 


5bp 


5'-overhang 


Hphl 


5 


8-7bp 


Ibp 


3'-overhang 


MnR 


5 


7-6bp 


Ibp 


3'-overhang 


Plel 


5 


4-5bp 


1bp 


5'-overhang 


Sap\ 


7 


1-4bp 


3bp 


5'-oveitiang 


Sfam 


5 


5-9bp 


4bp 


5'-overhang 



1 0 In addition to the en^mes provided in Table 1 » other restriction endonucleases 

that cleave at a distance from their restriction endonuclease recognition site include, but are 
not limited io.Alwl, Bbsl Bbvl BbvU, BsmM. Bsml Bsrl Earl, Gsul, Aiboll, Taqll, 
Tthl 1 III and their respective isoschizomers. These and other enzymes are known in the art 
and many are available from other manufacturers. The primers can be prepared to produce 

15 either 5'-overlapping ends or 3'-overlapping ends, as long as they are both are either 5'- 
overlapping ends or 3*-overlapping ends and are complementary to one other set of 
overlapping ends. 

In the case of Bpml (see Example 1), the enzyme digests asymmetrically, 14- 
16 bp from the 3'-nucleotide of the recognition site. The resulting cleavage has a 3*- 
20 overhanging end of 2 bp. A second primer is then designed with a complementary 
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overhanging end, and it is used to generate the adjoining fragment teiminus. At the opposite 
ends of the two fragments that are to be joined, similar complementary » overhanging ends are 
designed. 

The oligonucleotides are then combined with template nucleic acid (either 

5 DNA or RNA, e.g., such as for reverse transcriptase polymerase chain reaction (RT-PCR)) 
containing bases complementary to at least a 3' portion of the primers (also referred to herein 
as "templates*'). In one embodiment, the fragments are gene-amplified by PGR, RT-PCR or 
another gene amplification process using established PGR protocols such as those provided 
with PGR amplification kits, including those available from Perkin-Elmer Corp. (Emeryville, 

10 Galifomia). Preferably, the PGR products are analyzed by electrophoresis on a gel, such as 
an agarose gel and still more preferably the fragments of the predicted size are purified free of 
excess primers and small byproducts (such as by purification through a small column, such as 
a Qiagen™ column (Qiagen, Valencia, GA)). Following amplification or purification, the 
fragments are digested with the restriction endonuclease recognizing the restriction 

IS endonuclease recognition site in the primers. The digested fragments are then purified from 
flie digested ends of the primers, preferably by preparative agarose gel electrophoresis. The 
firagments are combined, annealed and are ligated lising standard hybridization and ligation 
conditions knovm for cloning (see Ausubel et al.. Current Protocols in Molecular Biology ^ 
John Wiley & Sons, 1994). 

20 Fig. 1 A illustrates an example of a self-assembling gene construct (SEQ ID 

N0:1) comprising six fragments, each having unique overhanging dinucleotide ends. In this 
example, the ends of the fragments prepared by the methods of this invention are constructed 
using primers that include Bpm\ restriction endonuclease recognition sites It will be 
understood by those of ordinary skill in the art that one or more other restriction 

25 endonucleases (such as those of Table 1) could similarly be used for the self-assembling 
product of Fig. 1 A. In a preferred embodiment, the primers were created as described above 
and preferably the Spends of the primers are non-palindromic (i.e., non self-complementary) 
to prevent self-annealing of such fragments. Each firagment in this example preferably joins to 
only one other dinucleotide overhang in the annealing/ligation mixture, assuring ligation only 

30 to the intended fragment partner. An advantage of this strategy is that the formation of 
concatamers or multimers is minimal. The restriction endonuclease site is removed by 
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digestion with the restriction endonuclease, leaving the junction free of the extra DN A 
sequences associated with the site. 

Using a single restriction endonuclease with a dinucleotide overhang (for 
example, using the enzyme Bpml) up to six pieces of genetic material can be joined together 
5 ma linear or circular form (such as a vector) without the need to perform sub-cloning 
procedures or detailed analysis of individual products because six unique combinations of 
dinucleotide overhangs create a directional clone with extremely high fidelity. With enzymes 
digesting single-base overlaps, only two fragments can be joined with positional and 
directional precision. With en^mes digesting three-base overlaps^ 4V2, or 32 fragments can 

10 be so joined in the correct order and orientation. Therefore, this invention also relates to the 
use of restriction endonuclease recognition sites that facilitate cleavage by restriction 
endonucleases with three-base overlaps and self-assembly gene constructs including 32 
fragments. Alternatively, a combination of restriction endonuclease recognition sites for use 
with a combination of restriction enzymes that create two-base or three-base overlaps can be 

IS used. Each enzyme has its characteristic limits to self-assembly imposed by the size of the 
overlap. For example, there are sixteen dinucleotides, therefore Bpml fragments (which have 
two dinucleotide ends each) are limited to eight for the purpose of self-assembly; therefore in 
another embodiment of this invention an assembly comprising eight fragments is 
contemplated. However, four of the sixteen dinucleotides are palindromes. Use of these 

20 palindromic dinucleotides can create some infidelity in the annealing/ligation reaction. The 
enzyme Hgal has a five base overlap, and there are 1,024 pentanucleotide combinations, 
permitting 512 fragments to be ligated together directionally and in order (no palindromes). 
The fragments to be joined at a particular place are designed to have their cut sites aligned, so 
that the overlapping region fits together. In some cases, the target sequences will contain 

25 natural restriction endonuclease recognition sites for the enzyme that is being used, such as 
one or more internal Bpml sites. These sites have the potential to self-religate during vector 
or gene construction or they can be by passed by using a substitute enzyme in the primers (for 
example, Eco 571 can substitute for Bpml). Alternatively, these sites can be removed by site- 
directed mutagenesis after consideration to the consequences of the mutagenized sequence to 

30 the gene or vector. 

In addition to class IIS enzymes, class II restriction endonucleases can be used. 
These enzymes have intrinsic methylation activity that affects the outcome in either a 
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negative or a positive way, depending on the purpose for which it is used. In a preferred ^ 
embodiment, the methylation activity of class II enzymes is ablated by mutation or by genetic 
engineering to convert the enzyme to an effective class IIS enzyme to expand the repertoire of 
useful enzymes for this invention. 

5 In another aspect of this invention, the primer design and target fragment 

sequence selection can be automated (see Example 5) using a computer to assist in the 
selection of unique overhanging ends that have complementarity only to the overhanging end 
of an adjacent fragment. 

Therefore, this invention permits high-fidelity annealing and ligation of six or 

10 more fragments with unique overhanging termini complementary to a single other 

overhanging termini. Any multitude of combinations can be created by combining the type 
of ovedianging termini that can be created. Moreover, if one is willing to sacrifice the 
fidelity of the reaction, a variety of combinations can be used to anneal a variety of firagment 
numbers. In these cases, some selection may be necessary, such as size selection of the 

1 S resulting fiagment based on electrophoretic migration or restriction endonuciease profiling, 
both methods well known to those of ordinary skill in the art 

It is also necessary to have a high per-step efficiency (e.g., each step in the 
piecess is performed with an efficiency of at least 80%) to effectively ligate large numbers of 
firagments without error. Where large numbers of fragments are used, the purity of the 

20 fragments becomes important This means that for large numbers of fragments, the digested 
DNA fi:agments for annealing and ligation should be substantially pure. If imdigested 
firagments, digested ends of primers, degraded or partially degraded molecules are present 
they can decrease the purity and affect the fidelity of the product Therefore, it is particularly 
desirable to ensure complete digestion of both ends of each fragment and to remove al of the 

25 digested ends Smm the firagments prior to including the firagments in an annealing and ligation 
reaction. The use of Qiagen columns for oligonucleotide removal prior to digestion is 
generally sufBcient to permit efficient digestion of the fragments. Agarose gel isolation is 
desirable after digestion particularly where the product contains some fragments that do not 
appear to be fiiU length. The use of an analytical gel before and after digestion helps in 

30 determining whether both oligonucleotide tags have been removed. The isolation of 

firagments firom agarose gels prefembly avoids the use of ultraviolet light and exposure of the 
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DNA to ethidium bromide is also preferably avoided. These methods can be avoided by 
numing replicate lanes and staining only a portion of the gel. 

The fragments and vector are then digested to yield fully complementary ends, 
and the fragments are preferably again purified, as described above (such as through a Qiagen 

5 column or by gel isolation). The purified fragments are ligated together in a test tube, under 
standard conditions, such as using bacteriophage T4 DNA ligase and ATP. Preferred 
ligations include at least 20fig/ml total DNA concentration in the ligation mix to favor 
intermolecular interactions, and an equimolar ratio of fragments to be ligated. Where a 
prokaryotic intennediary is used, the ligated assemblage is transformed into a bacterium, such 

10 as an £. coli host, and the colonies are: selected with a drug (such as an ampicillin, 

tetracycline, or kanamycin marker). The colonies can then be selected either by individually 
selecting colonies or growmg a mass culture, such as where a vector library has been created. 
Restriction enzyme analysis can be used to determine the identity of individual constructs or 
to assess the validation of the combination of plasmids. The plasmids can then be grown up 

IS and used as needed. 

In one embodiment of this invention, at least a portion of a vector is used as 
one of the firagments for the ligation of at least three fragments according to this invention. In 
one example, where a vector is used as one of the starting firagments, two restriction 
endonuclease recognition sites recognizing an enzyme that cleaves at a distance from the 

20 recognition site, such as at least one Bpml site, can also be introduced into the vector. This 
permits the vector to be digested with the restriction endonuclease to produce a product 
having ends complementary to two ends of the insert DNA firagments. The vector can be 
made by amplifying a plasmid or portion thereof using the primers of this invention. Thus, 
the vector can also be constructed to include a variety of restriction endonuclease recognition 

25 sites using a variety of restriction endonucleases, including a variety of class II restriction 
endonucleases. In some cases, the target firagments for amplification will contain natural 
restriction endonuclease recognition sites for the enzyme that is being used for the self- 
assembly, such as for example, a fragment that includes one or more internal Bpml sites. 
Care should be taken either to utilize the complementarity of the naturally occurring site to 

30 reform the fiugment as it originally existed or to eliminate the restriction endonuclease 
recognition site using, for example, site-directed mutagenesis. Preferably, the restriction 
endonuclease recognition site is be substituted for a different enzyme (in the case ofBpmL 
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substituting EcoSll or BsrDl) that has an equivalent structure at its ends. Two or more - 
fragments of insert or two or more fragments of vector with at least one insert are amplified 
using primers accoiding to this invention. 

The exemplary enzyme, Bpml digests DNA 14*16 base pairs (bp) from the 3'- 
5 nucleotide of the recognition sequence (RS). Thus» by placing the RS exactly 14-16 bp from 

■ 

the desired dinucleotide cut site, the practitioner tags the dinucleotide for ligation with 
another dinucleotide that is exactly complementary to it. Such a complementary dinucleotide 
can be inserted by using the same enzyme and RS to make another fragment which fits the 
first exactly, as illustrated in Fig. 1 . Because there are sixteen possible dinucleotide 

10 combinations (including twelve combinations that do not have palindromic ends), it is 

possible to create up to six firagments with unique dinucleotides, and it is also possible to join 
them all together in a predetermined order and orientation (Fig 1 A). In addition, the 
palindromic sequences (such as AT, CG, TA, and GC) could also be used, although 
inefficiency and inconect ligation will result from the self-complimentarity of these 

15 sequences. It is fiuthermore possible and desirable to have three or more fragments joined in 
this way, such that the construct is circular as in Fig. 1 , comprising a vector that may be 
grown in a bacterial and/or eukaryotic host cell. If the genetic construct is to be used as a 
vector, the vector should be designed to include a proper origin of replication to enable it to 
replicate in a particular cell. For example, a prokaryotic origin of replication such as a 

20 coliform plasmid origin of replication enables circular DNAs to be propagated in £. coli host 
cells. It is desirable to have at least one selectable marker, such as a neomycin marker that 
enables recovery of the clone through a selection process. It is also desirable, but not 
essential, to have two or more selectable genetic elements, to permit dual selection. For 
^cample, if one of the fragments contains a prokaryotic plasmid origin of replication, and 

25 another firagment contains a selectable marker, then the two fragments are both selectable, 
since the construct will grow in prokaryotic cells in the presence of a selection drug (such as 
ampicillin) only when both firagments are present. Drug selection can be combined with the 
methods of directed self-assembly to assure a high percentage of correct products. Because 
of the unique complementarity of the firagments, each contributes a selectable element that 

30 leads to recovery of a high percentage of correct products. 

For prokaryotic vector construction, at least one fragment should contain a 
prokaryotic origin of replication and one firagment should contain a drug resistance maricer 
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gene. However, an advantage of the methods of this invention is that the construct can be ~ 
introduced directly into eukaryotic cells. Here no plasmid origin of replication is necessary 
and no prokaryotic selectable marker or other prokaryotic nucleic acid sequence is necessary. 
In cases where the vector is subject to regulatory approval or where optimal gene function is 

5 necessary, it may be undesirable to include prokaryotic sequences, such as extraneous 
plasmids or expressed prokaryotic fragments particularly if the sequences contain 
inmiunostimulatoiy sites that can lead to activation of the intracellular immune system and 
inactivation of a gene product (see Krieg et al., 1 Lab. Clin. Med., 128:128-133, 1996) or to 
avoid risks of endotoxin contamination. Moreover, the use of self*assembled product, 

1 0 according to the methods of this invention saves labor and time involved in the screening 
process. 

Thus, in a preferred embodiment of the invention, the nucleic acid fragments 
are self-assembled in vitro, and are transferred directly into eukaryotic cells, by transfection, 
injection, or other methods known in the art. In one embodiment the cells receiving the 

1 S assembled product of this invention are helper cells for recombinant virus assembly 

(including, but not limited to retroviral helper cells for retroviral or retrotransposon vectors, 
adenovirus helper cells for adenovirus vectors or herpes simplex virus helper cells for herpes 
simplex vectors). Alternatively, the assembled product can be introduced into cells along 
with a helper virus or the assembled product can be introduced into target cells for direct 

20 expression. The assembled product can be a vector, a minichromosome vector, a portion of a 
chromosome, or the like. In the prefened case of a retroviral vector, the genes are first 
transfected into a first helper cell line (such as ecotropic helper cells, GP+E86 (Markowitz et 
al. J. Virol 862:1 120-1 124, 1988). The retrovirus-containing supernatant from these cells is 
then filtered (0.45mm Nalgene filters) preferably 48-72 hours after transfection and the 

25 filtrate is transferred to a second complementation retroviral helper cell line (such as P A3 1 7 
retroviral helper cells, Miller et al., Mol Cell Biol 6:2895-2902, 1986). After an additional 
48 h, the second helper cell line is selected with the marker drug (such as the drug G41 8 for 
the selectable neomycin (neo) marker gene), until oidy .drug-resistant cells remain. These 
cells contain stably mtegrated vectors that can be used to repeatedly transduce human cells. 

30 Advantageously, in the case of adenovirus vectors or other large eukaryotic -derived vectors 
including eukaryotic virus-derived vectors, it may be impossible to propagate them in 
prokaryotic hosts. The gene self-assembly method of the instant invention provides an 



wo 98/38326 PCT/US98/03918 

alternative to in vitro recombination method of gene construction by permitting large 
constructs to be constructed. 

One advantage of introducing the assembled product of this invention into a 
helper cell line to produce recombinant virus for the introduction of a gene or nucleic acid 

5 complex into a cell is that the assembled product will be auto-selected by the cells during the 
packaging process. Therefore, even where the overhanging termini have palindromic 
sequences, where there is more than one (but preferably less than four) unique 
complementary matches for a particular overhanging termini, or where concatamers have 
formed, only the correct or functional assembled products are expressed, transmitted, and 

10 assembled into virus. When the virus is then introduced into cells, the use of a reporter gene 
or another selectable marker provides yet a second layer of security for the selection of cells 
containing a properly assembled construct For example, where a retrovirus helper cell line is 
used to produce a recombinant retrovirus containing the product of this invention (for 
retrovirus, RNA transcribed from the DNA product of the invention becomes packaged into 

1 5 the virus particle), a retrovirus-derived vector is transcribed as RNA and transmitted by 

packaging the RNA in a retrovirus particle. In order to be properly transmitted as a virus, the 
construct must be: 1) transcribed as RNA in a vector producer cell; 2) packaged into viral 
particles; 3) reverse transcribed into double-stranded DNA (in the recipient cell); and 4) 
integrated into the host chromosome. Each of these steps requires specific djr-acting 

20 sequences that must be correctly positioned within the vector. Thus, passage via retrovirus 
(or by other virus) is a means of auto-selection for the essential sequences. 

In one application of the methods of this invention, the methods are used to 
rescue expressed sequences from RNA, or genomic sequences from cell DNA without 
disrupting the promoter sequences. Cellular transcriptional promoters are typically difficult 

25 to identify and isolate because they are generally not included m the RNA molecule and often 
extend over a considerable distance in a chromosome. One application of this invention 
relates to a promoter rescue technique that permits the entire promoter, or a fragment of a 
promoter to be isolated and cloned directly in to an expression vector without disruption of 
the flanking sequences. Promoter rescue techniques are known and include WO 94/20608 to 

30 Hodgson. 

In a preferred embodiment of the invention, transcriptional promoters are 
cloned in a transcriptionally active manner for the selection and identification of new and/or 
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of tissue or cell-specific promoters enabling them to be used, selected, or screened for activity 
directly. For example, Fig. 3 illustrates one example of the formation of a vector for the 
incorporation of promoter sequences and the ultimate identification of those sequences using 
an exemplary plasmid VLBPGN (SEQ ID N0:1) as provided in Example 3, with Bpml sites 
S located within the locus of a retrotransposon (VL30) long terminal repeat (LTR). These 
methods preserve the structure and functionality of transcription factor response elements. 
The characteristic secondary structure of the LTR RN A remains very similar to the original 
LTR from which the promoter was rescued, thus preserving the important features of the 
original RNA/DNA molecule. Those of ordinary skill in the art will recognize that any of a 

1 0 variety of primers can be used with a variety of vectors and that the constructs of Figs 2 and 3 
are exemplary and not limiting. 

Fig. 2 illustrates the primers used to amplify the promoter insert (identified at 
a and c in Fig.2), and the insert region of the LTR (boxed), both of which can be digested at 
the same nucleotide position with Bpm\, to ensure a proper and seamless fit. In this example, 

IS after digestion of the vector, the two Bpml sites leave non-complementary ends (a 3'-CC 
overhang on one end, and a 3*-GC overhang on the other). Thus, the ends will not efficiently 
anneal or ligate to one another. However, the complementary termini of the insert serves as 
linkage, enabling the plasmid to be completed by ligation. 

In the example illustrated in Fig. 2, the terminus on the 3'-side (GC) is 

20 palindromic. Palindromic termini are self-complementary and can therefore ligate to 

themselves or to an identical terminus facing the opposite way (forming concatamers in the 
. opposite direction). Despite the presence of palindromic termini and despite the potential for 
reduced fidelity in the self-assembling process, a large percentage of clones obtained by 
inserting promoter sequences into VLBPGN were assembled correctly (20/23). These levels 

25 are reduced somewhat when three or more fragments are combined for self-assembly, 

according to this invention and preferably, the use of palindromic termini are avoided when 
even numbers of nucleotides are exposed as overhangmg termini because with even numbers 
of nucleotides there is an axis of symmetry. As noted above, where five base overhangs are 
used there are 1024 possible combinations of five nucleotides [(4)^], yet none of them is 

30 palindromic. 

The vector of Fig. 3 is an example of a particular type of vector that is known 
as a retrotransposon vector. Retrotransposon vectors are described and reviewed in Hodgson 
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et al., 1996 Retro-Vectors for Human Gene Therapy. RG Landes Company, Austin TX, 
chapter 5 and see US Patent 5,354,674 to Hodgson. This type of vector is derived from a 
mouse cellular retro-transposon element that has no essential viral or cellular genes, and that 
has little sequence similarity to a retrovirus. However, this RNA (known as VL30 [virus-like, 

5 MS]) has all the necessary m-acting structural elements (such as LTRs and primer binding 
sites) required for efBcient transmission by a type C murine or primate retrovirus. Thxis, it is 
a parasite transmitted by retroviruses that is also expressed as a cellular RNA in most mouse 
cells and tissues. This RNA becomes packaged into retroviral particles when the mouse cells 
become infected by retrovirus. The retrovirus then transmits the VL30 (or a VL30 vector) to 

1 0 the next infected cell (which can be a human cell). The RNA is then reverse transcribed and 
integrated into the DNA of the host cell. 

Some advantages of VL30 vectors (over retrovirus-derived vectors) are: 1) 
lack of viral genes and other sequence homology that could lead to replication competent 
retrovirus (RCR); 2) ability to be expressed long-term in vfvo; 3) a variety of LTR 

1 5 transcriptional promoters that can be expressed in various tissues and under the influence of 
various hormones and other stimuli; and 4) the ability to express genes in a number of cell 
types that are targets of gene therapy. An additional advantage is that VL30 parts can be 
switched with those of classical retrovirus-derived vectors. For example, the LTR or 
packaging signal of VL30 can be used in place of the equivalent retroviral signal. The ability 

20 to make mixed, or chimeric retro-vectors is a special application of gene self assembly 
technology. 

Using a specific primer set, such as that shown in Fig. 2, or others, as taught in 
this invention, it is possible to amplify the U3 sequences expressed in the RNA of many 
different types of mouse celb. This is done usuig standard RNA isolation methods (Ausubel 

25 et aL, supra)^ coupled with extensive digestion with ribonuclease-free dexoyribonuclease, to 
elimmate residual DNA. Thus, to obtain a promoter that is expressed in the liver, one isolates 
RNA from liver and uses an RT-PCR procedure, such as those known in the art, with the 
primers to amplify the desired promoters. Fig. 6 illustrates liver RNA-derived promoters 
obtained using the methods of this invention. However, the promoters can also be derived by 

30 conventional PGR from cDNA libraries (Fig. 5 illustrates T cell-derived promoters that were 
obtained in this manner). It is also possible to use the well-known hormonal and 
pharmacological inducibility of VL30 LTRs to find LTRs that are responsive to peptides, 
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hormones, and cytokines (for a table and description of VL30 pharmacologic responses (see 
Hodgson et al., 1996 Retro-Vectors for Human Gene Therapy, RG Landes Company, Austin 
TX, chapter 4, and Fig. 42). Examples of substances inducing various VL30 promoters to 
high levels include: epidermal growth factor, basic fibroblast growth factor, insulin, 

5 erythropoietin, glucocorticoid hormones, activators of cyclic 3'-5*AMP, and others. To 
rescue promoters with pharmacological responsiveness, cells or animals stimulated with the 
desired pharmacological agent are subjected to the RT-PCR procedure and the resulting U3 
regions are cloned into a vector, (such as the exemplary VLBPGN) and are tested for 
inducibility. Standard RNA blotting procedures can be used before isolating VL30 

1 0 promoters, to determine whether a particular drug or hormone causes induction of VL30 

RNA expression in a particular mouse cell or tissue. After the promoter has been rescued, the 
vector is transmitted via retrovirus to the target cell (possibly a human equivalent of the 
mouse cell from which the promoter was rescued). After selection with the drug 041 8 (400- 
700 ^g/ml, for 7-10 days) to select against cells not containing the vector, the target cell 

1 5 population is challenged with the pharmacological agent of choice. Reporter gene expression 
(in the example, OFF) or RNA expression, as determined by RNA blotting, can be used as an 
assay of gene inducibility by the agent (for exemplary gene expression methods, see 
Chakraborty et al., BiochenL Biophys Res. Commtm. 209:677-683, 1995). 

Using any specific primer set designed for use with VL30 retro-elements and 

20 using total cellular RNA firom a particular mouse cell type as a template for RT-PCR, (using 
commercially available kits and methods therein) candidate promoter elements can be 
amplified. This method is usefiil for the identification of mouse-derived promoters and in 
particular the method is usefiil for the identification of cell-type specific or tissue-specific 
promoters firom a mouse and for the selection of these promoters and the identification of 

25 tissue-specific or cell-specific promoters that fimction in himian cells. Thus, these types of 
vectors and the methods for using these vectors permits the identification of promoters to 
permit controlled transcription of a foreign gene. The promoters, originally obtained firom the 
mouse, can be used to effect tissue-specific or cell-specific expression in a human or animal 
liver cell such as a hepatocyte, or in a human blood cell swh as a T-heiper cell or in an 

30 erythrocyte (red blood cell). Methods are disclosed in Example 2 for the screening and 
selection of the promoters from a library of amplified promoter sequences. Other methods 
are well known to those of ordinary skill m the art. The specificity of the selected promoter 
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can be assessed, for example, by introducing a selectable marker under the control of the test 
promoter in question and introducing this construct into various cells to assess the ability of 
the promoter to selectively regulate expression. 

The amplified fragments represent U3 promoter regions firom any RNA 

5 species expressed in the originating cells and their abundance will be in approximate 
proportion to the number of expressed copies of RNA in the original mixture. Example 3 
illustrates one example using a mouse T^helper cell cDNA library to produce amplified 
fragments representing U3 regions expressed in T cells. The vectors were efficiently 
expressed as RNA and protein in PA317 helper cells, and were transmitted by retrovirus into 

1 0 human T-helper cells, where they were integrated and expressed as protein in the form of a P- 
galactosidase reporter gene, as visualized by X-gal staining. The products of this experiment 
are provided in Fig. 5 and as SEQ ID NOS: 2 and 3 from T-helper RNA. The products of 
another experiment are shown in Fig. 6 as SEQ ID NOS: 4-13 from mouse liver RNA (by 
RT-PCR). 

1 5 Examination of the different U3 sequences isolated from T cells and &om liver 

revealed several things. First, the T cell U3 sequences were related to each other, as were the 
liver sequences. However, the two types of U3 sequences were quite different between the 
two sources (T-cell, Figure 5 and liver, Figure 6). Specifically, the liver sequences (Figure 6) 
appeared to be a closely related group, differing mostly by single point mutations, some of 

20 which may affect transcription factor binding sites. Some of the polymorphic sites included: 
a phorbol ester response element (VLTRE); a Rel/NFKb binding region, and a possible 
glucocorticoid response element (ORE). Some of these polymorphisms are illustrated in Fig. 
6. The T cell-derived sequences (Fig. 5, SEQ ID N0:2 and 3), on the other hand, differed 
significantly in length, with SEQ ID N0:3 missing more than 120 bases (compared with SEQ 

25 ID N0:2) including putative binding sites for retinoids (RAR/RXR) and several elements 
contained within the enhancer repeat region (including a cAMP response element (VLCRE, 
or CREB^un binding site), and putative serum response element (SRE, CARG, and 
NF1/IL6). SEQ ID N0:3 represented one out of five clones sequenced, while SEQ ID N0:2 
represented four out of five. Possible sites of interactions between transcription factors and 

30 DNA can be observed by comparing the experimentally derived U3 sequences with those in 
Hodgson et al. ,(Retro-Vectors for Human Gene Therapy, 1996 Fig. 4.2 supra). In addition 
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to the deleted sequences of SEQ ID N0:2, there are a number of single base differences - 
within the conserved regions of the two T cell-derived sequences. 

Advantageously, a number of new VL30 promoter sequences (SEQ ID NOS: 
2-13, supra) were identified using these methods despite the fact that VL30 RNA comprises 

5 only about 0.3% of cell mRNA represented in a cDNA library. Moreover, in each case, the 
cloned insert was isolated without the need to use linkers, ads^ers, or multiple cloning 
sequences such as those that are typically use for other library construction methods. The 
promoter sequences can be used in the vectors disclosed here to express inserted foreign 
genes or the promoter sequences can be substituted into other retroviral vectors, such as 

10 MoMLV-derived vectors or other VL30-derived vectors. Further, vectors contaming the 
promoter sequences can be propagated in retroviral helper cells, such as PA317 (U.S. Patent 
4,861,719 to Miller) or introduced into cells by chemical or physical transfection. 

In another application of the methods of this invention, libraries of amplified - 
sequences can be incorporated into vectors using two or more fragments and using the 

1 5 restriction endonucleases cleaving at a distance from their recognition sites. Preferably the 
vectors are created using six or more fragments and preferably greater than 10 or more 
firagments. For example, as applied to VL30 promoter sequences, because there are over a 
huxuired VL30 retro-elements in the mouse genome, it is possible to amplify all of the 
promoter sequences en masse^ and propagate them en masse, enablmg screening by serial 

20 passage through helper cells (such as the PA3 1 7 helper cell line) or by means of a replication 
competent retrovirus, as illustrated in Examples 3 and 4. Conversely, the promoter region 
may be broken down into several sub-domains and permutations of each could be combined 
and screened to enhance the chances of generating a superior construct (Fig. 4B). 

As an example of breaking a promoter region down into several sub-domains, 

25 Fig. 7 illustrates a similarity plot of nucleotide sequences foimd in VL30 U3 regions. Plot 
similarity was performed usmg the Plot Similarity program (Wisconsin Sequence Analysis 
Package, release 8.1, Genetics Computer Group, Madison, WI). This program plots the 
running average of the similarity among the sequences in a multiple sequence alignment. The 
sequences compared were those foimd in Fig. 4.2 of Hodgson, 1996, chapter 4 {infra). That 

30 is, the plot discloses the degree of conservation of VL30 promoter sequences among known 
VL30 promoters. From the figure, it can be seen that conserved sequences (close to 100% 
conserved) can be used as primer binding sites to amplify the adjacent sequences by PCR. 
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An allelic mixture of three fragment sets is then created to make a combinatorial library of 
promoters that can be positively selected, such as by using retroviral amplification of the 
active sequences. This, used in combination with the Fig. 4.2 (Hodgson, 1996, chapter 4 
supra) can be used to determine regions of high similarity. Regions of high similarity within 

5 the U3 region can be replaced with one another. Therefore, a library of permutations of these 
sections can be made by combining allelic pools obtained by amplifying the sequences from 
individual subsections, followed by ligating the subsections in the conect order using the 
methods of the instant invention for gene self-assembly. For example, sub-section 1 can 
include the distal enhancer (firom the LTR S'-end to the site of insert primer 2, see for 

10 example the region defined by the insert primers 1 and 2 (SEQ ID NOS 55 and 56 of 

Example 4). In this way, using a plot similarity (such as Fig. 7), within each sub-section, the 
primers position fiagments within a region of nearly 100% identity. Degenerate primers can 
also be used in these experiments to account for multiple nucleic acid base combinations 
along a particular sequence. In each case, the primers preferably are designed to have a 

1 5 melting temperature that is compatible with the RT-PCR conditions being used, and the 
conditions should be those recommended by the manufacturer (preferably Perkin Elmer 
Corp., Emeryville, CA). In Example 4, a set of primers is given that can be used to amplify 
different U3 subsections, together with directions for assembling a combinatorial library. 

It will be appreciated by persons of ordinary skill in the art that the methods 

20 of the instant invention can thus be used to make allelic libraries of a variety of genes. For 
example, different allelic portions of a gene can be combined in a predetermined order and 
orientation to produce combinatorial libraries, without the need for fortuitous restriction sites 
separating the parts in the original construct, and without perturbing the important sequences 
joining the parts using the methods of this invention. 

25 In this invention primers are constructed as described above. However, for the 

generation of allelic libraries or more complex library constructs it may be helpful to include 
5*tags into the 5' end of the primer. The purposes of the tag sequence are: 1) to provide extra 
nucleotides on both sides of the restriction endonuclease recognition sites (for more efficient 
digestion); and 2) to enable recovery of sequence tags or undigested fiagments by means of 

30 an affinity reagent (such as silica, magnetic beads, or nitro-cellulose containing the 
complementary sequences) for purification. The use of an affinity reagent permits the 
digested ends to be purified away fiom the digested fragments. Furthermore, if any 



wo 98/38326 ^4 PCT/US98/03918 

undigested ends remain after thorough digestion, the affinity reagent will remove them, 
further aiding in the purification. In one embodiment, affinity purification of the digested 
fragments is used in place of gel isolation, eliminating possible damage caused by ultraviolet 
light as well as possible damage caused by dye (e.g., ethidium bromide) binding to the DNA. 

S It will also be appreciated that a number of other variations to the primer 

sequences can be employed. For example, as discussed above, the enzyme recognition site 
for an enzyme that digests outside of its recognition sequence is included in the primer, so 
that the DNA digest creates an overlapping end that is complementary to one other terminus 
to which it will be joined. The enzyme recognition site can be moved to any location within 

10 the primer so as to digest the DNA at the exact location desired. The primer can also be 
programmed with a novel enzyme recognition sequence to add any desired sequences 
between the two sequences to be joined or to incorporate a linker or adapter if desired. If the 
sequences to be amplified contain the enzyme recognition site of the primers, it may be 
necessary to switch to a different enzyme usage. The use of several different enzymes is 

1 S possible and has been discussed above. As with other PCR procedures, after the initial primer 
selections have been made the primers are assessed for their ability to fold back on 
themselves or to create internal secondary structure. The primers are preferably modified to 
avoid palindromic sequences or the potential for self folding within a primer. Nucleic acid 
analytical software (such as the Wisconsin GCG package, Oxford Biomolecular, Oxford, UK) 

20 is available to perform this analysis and aid in the selection of alternative primers. 

In addition, as with all PCR processes, it is necessary to determine the melting 
temperatures (T J, and to adjust the annealing temperature of the PCR reactions to 
compensate for such temperatures. Finally, it is important to perform a sequence redundancy 
search, to determine whether the target sequence (the sequence complementary to the primer) 

25 is found more than once in the region to be amplified. If the sequence is repeated, it will be 
necessary to use a different primer in order to establish the single, correct priming site. 
Preferably, no more than 6-8 bases of incorrect target complementarity at the 3'-end of the 
complementary region is used and to allow a difference of at least 10° C between the T„s of 
the correct and the incorrect target. The annealing temperature should always be at least S^C 

30 lower than the T„ of the correct target and 5°C above the T„ of the incorrect target. Again, 
the necessary software and instructions are readily available from the cited sources 
(Wisconsin Gene Computer Group and Oxford B^iomolecular, supra) 
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Next, a vector is constructed to include the appropriate elements for expression 
in the desired cell type. For example, the plasmid of Fig. 3 A can be used for the creation of a 
promoter library or a vector can be created using a commercially available vector and primers 
to create a three or more fragment annealing and ligation reaction as provided above. 

5 Preferably, the inclusion of a dominant negative selectable marker on the vector (e.g., the 
neomycin phosphotransferase gene, conferring G41 8 drug resistance) can be used to reduce 
the likelihood that cells without the vector are being maintained in culture. 

Multiple allelic copies of DNA (cell derived or cDNA) can be amplified in 
separate reactions as a set of potential inserts with each set having its own imique overlap 

10 sequence following digestion with a restriction endonuclease, according to this invention. 
The fragments can then be ligated into an existing vector or in a single reaction of three or 
more fragments to form a combinatorial collection of potential alleles. For example, if six 
adjacent regions are amplified from five separate alleles, the number of combinations would 
be 5^ or 15,625 potential combinations. The combinations can then be grown en masse, and 

1 5 selected in vitro or in vivo. A variety of screening strategies can be used in this invention and 
those of ordinary skill m the art will appreciate that the type of sa:een will match the type of 
library being generation. Therefore, for the promoter library, introducing members of the 
library into particular cell types to assess for expression in one or more cell types versus the 
absence of expression in another cell type is evidence of tissue-specific or cell-specific 

20 expression. For screening purposes, the libraries of this invention fimction like other libraries 
created through other methods. A variety of screening methods for a variety of libraries have 
been described in the art. For example, selective screens are reviewed by Hodgson et al. 
(1996, RG Landes Company, supra). Reporter protein production is well known in the art as 
is dominant selectable marker (e.g. drug) selection and selection by fluorescence activated 

25 cell sorting, antibody affinity selection, phage display selection (such as commercially 

available fix>m Amersham, Milwaukee, WI), and the like can be used without detracting from 
this invention. 

In this way, it is possible to isolate multiple forms of genes, gene fragments or 
regulatory regions such as transcriptional promoters or packaging signals (for example, in a 
30 retro-vector system). The individual constructs may then be tested in vitro or in vivo to fiirther 
characterize a particular phenotype. 
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In one example the method is used to create a libraiy of complementarity 
determining regions (e.g., allelic variations that give rise to antibody diversity) of antibodies 
or fix)m receptors, including T-cell receptors, epitopes, antigens, ligands and the like. For 

« 

example, where a library of T-cell receptors is created, the introduction of a vector designed 
5 to create a functioning T-cell receptor can be introduced into T cells or T-cell progenitors and 
the cells can be tested for their ability to bind to a particular test ligand. The ligand- 
recognizing cells can then be isolated from the ligand and grown in the presence of cytokines 
to produce specialized T cell clones. Where a library of antibodies or antibody fragments is 
created, the antigen reactive portions, for example, can be recombined in a vector containing 
1 0 the remaining portions of an antibody molecule to generate antibodies or antibody fragments 
in a cell. In other examples, the methods of this invention can be used to create allelic 
domains of receptor femilies (such as the steroid receptor super-family); libraries with related 
regions from peptide hormones; cytochromes P450; or other protein families that have shared 
domains or sub-sections with similar structures. The methods of the instant invention allow 
15 the joining of allelic sub-sections in an ordered fashion. In each case, it will be necessary to 
design primers, and to keep track of the uniqueness of joining overlaps and the presence of 
mtemal restriction sites as described above. While these will be different in each case, here 
are listed some general guidelines that are incorporated into the method of the instant 
invention. 

20 As discussed above, although described as it relates to promoter libraries, 

libraries of other nucleic acid sequences can be created using the methods of this invention. 
These libraries include, introns and/or exons and/or functional domains libraries, libraries of 
potential alleles for a particular gene sequence, and the like. These sequences can be 
amplified from cell DNA or RNA using the primers of this invention and incorporated into a 

25 variety of vectors. For example, one vector of this invention, VLBPGN, has a portion of 
LTR removed and can be used to create a variety of libraries following digestion with Bpml . 

Selected or screened products of the combinatorial library can be used for gene 
expression, such as the promoters of Figs. 5 and 6. In addition, the exploitation of these 
sequences for the expression of a variety of genes, the LTR fragment contaming the promoter 

30 can be joined to one or more functional retroviral packaging signals, internal ribosome entry 
sites, additional promoters, coding regions, processing sites, and the like. 
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Advantageously, there are almost no spatial constraints upon the joining of - 
molecules by the method of the instant invention and other methods have not taken advantage 
of the combination of PCR to isolate genes or gene fragments; enzymes cleaving at a site 
distant fiom their restriction endonuclease recognition site to combine three or more 

S fragments with precision; and, the use of unique overlapping non-palindromic termini to 
ensure fidelity of multi-fragment ligations. This combination permits the artisan to prepare 
complex gene constructions in one ligation step and does not require sequential sub-cloning 
into a vector or propagation in a prokaryotic host. Added to this the combination by these 
methods of fragment pools facilitates recombinatorial genetics. 

10 The ability to recombine (in the correct order and direction) and screen a large 

number of allelic variants (whether as a simple library or as a combinatorial library), resulting 
in increased abundance (by amplification in the RNA, and subsequently in the DNA) is a 
special characterisitic of this invention. Particular advantages of this system are obtained 
when the methods of this invention are combined with retrovirus vector technology or other 

15 virus vector technology. For example, the combination provides a form of in vitro evolution 
whereby the passage of the library through virus and through cells selects functioning 
sequences and increases the abundance of the surviving RNA and DNA molecules. 

For example, consider the consequences of screening several different 
promoters expressing RNA in a donor cell (i.e., a cell producing virus particles), but at 

20 differing levels of RNA abundance. In the following example, the least abundant RNA 

species is expressed at 0.1 copy of KNA per cell, while six others are expressed at 1 copy, 10 
copies, 100 copies 1,000 copies, or 10,000 copies, or 100,000 copies/cell, respectively. After 
a single passage, the DNA copy number in the recipient cells now reflects the approximate 
RNA copy number in the donor cells. These numbers are further amplified in the relative 

25 abundance of RNA species produced in the recipient cells. Disallowing for factors such as 
position effects, transcription factor depletion, etc., (which may be considerable), the same 
relative ratios of expression would be expected. Taking into consideration position effects, 
the disparity between abtmdance caused by changing insertion loci should average out. The 
most abundant RNA species after two passages is then many orders of magnitude more 

30 abundant than the least abundant 
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Species: 


RNA 


DMA copy RNA 


DNA copy 


RNA 




abundance: 


no. 


abun. 


no. 


abun. 




P=0 


P=1 




P=2 


P=2 


A 


0.1 copy/cell 


0.1 


0.01 


0.01 


0.001 


B 


1 


1 


1 


1 


1 


C 


10 


10 


100 


100 


1,000 


D 


100 


100 


10.000 


10.000 


10« 


E 


1,000 


1,000 


lO' 


10* 


10» 


F 


10.000 


10,000 


10" 


10? 


10« 


G 


100,000 


100.000 


10" 


10'» 


10" 



Table 2. Enhancement of DNA and RNA copy number as a result of different RNA 
expression levels, after retroviral passage. P= (no. of passages). Numbers are interpreted as 
relative ratios within a column. 

5 

The present invention is able to efficiently create a library of RNA or DNA 
sequences whether or not they are in low abundance. The kinetics of screening for RNA 
abundance of a promoter can be appreciated best in the following discussion. For the 
purposes of this discussion, position effects have been ignored. An equation describing the 
1 0 kinetics of screening for RNA abundancy is: 

(1) R„,3^=Ax/5X«> 

The above equation (1) can be stated in plain English: The relative abundance 
15 of an RNA species % (Pitix]^^ ^ population of RNA molecules expressed in a single cell 
or within a population of cells) is equal to the RNA copy number of RNA species % {A^) 
divided by the sum of the RNA copies of all RNA species present, including x- 

The relative abundance number of any given species changes as the number of 
passages change, according to the following approximation: 



(2) R^py-D^^poR^ 



In the simplest of terms, equation two (2) can be expressed as: The abundance 
of RNA species x after Y passages (R^py) is equal to the initial abundance of the DNA for 
25 species x at passage=0 (D^^po). multiplied by the RNA abundance/DNA copy, raised to the 
power of the munber of passages plus one. Thus, a typical RNA species that starts out as a 
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single copy of DNA, after zero passages (/.e., in the donor cell) expresses 10 copies of 
RNA/cell. After one passage it is amplified at the DNA level to a relative ten copies (the 
same as the RNA abundance at P=0), and at the RNA level to 100 copies (10 copies per DNA 
copy). The reason for the amplification is that viral packaging and passage is based upon the 

5 number of RNA copies present in the donor cell. These calculations can be used to arrive at 
approximate abundance determinations for any given passage. The actual results of any given 
experiment, of course, will be biological rather than physical or mathematical. This means 
that other variables such as RNA efficiency of transmission and longevity, availability of 
transcription factors, experimental variation, etc. also come into play. The underlying 

10 purpose of the approximating equations, however, is to illustrate that RNA is amplified in 
DNA in proportion to the abundance of the template (RNA) within the cell. 

The abundance of mRNA in cells can vary continuously from less than a copy 
per cell to nearly 100,000 copies/cell in actively transcribing, highly-specialized cells such as 
reticulocytes, the chicken oviduct, the silk moth silk gland, etc. Therefore, the spectrum of 

IS RNA abundance fiom 0*10^/cell is within the biological window of interest. For most 
practical purposes, such as biotechnological expression of genes in specific cells, only the 
higher end of this abimdance range is desired. Therefore, using a viral selection system, as 
disclosed in this invention, it may be possible to disregard those species widi less than a 
threshold level, such as <0. 1 copies per cell. The selection through virus will lead to the 

20 recovery of the more abundant species. Furthermore, because the vector is likely to be the 
only considered sequence, it may be considered as a proportion of the whole of RNAs 
expressed in the target cell. The situation is more complex when a large number of 
permutations and combinations is generated, for example by self-assembling thousands or 
millions of firagments in a predetermined order using the self-assembly technique of the 

25 mstant invention. Consider the assembly of allelic variants of four promoter subregions: 

distal enhancer, proximal enhancer, distal promoter and proximal promoter. If 100 varieties of 
each of the four groups were amplified and combined using the instant process along with a 
single vector, 10^ resultant combinations could occur. However, a sufficient number of 
molecules to start out a combinatorial screening program might be a million. The problem 

30 can be simplified by considering these in groups as follows: 
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Table 3. Grouped abundance of RNA molecules 
derived from combinations. 



No. of species RNA 


Total No. RNA RNA at P=1 


RNA at 


RNA at 


in group: 


abundance: molec. at P=0: 




P»2 


P=3 


9X10' 


1 


9X10* 


9X10* 


9X10* 


9X10* 


2X10* 


10 


2X10* 


2X10' 


2X10* 


2X10* 


2X10* 


1,00 


2X10* 


2X10* 


2X10" 


2X10" 


1X10' 


1000 


1X10* 


1X10* 


2X10" 


2X10" 


1X10' 


10,000 


1X10* 


1X10' 


1 X 10" 


1 X 10" 


1 


100,000 


1 X10* 


1 X 10" 


1 X 10'* 


1 XIO^ 


Sum Total: 




6.6 X 10* 


1.11 X 10" 


1.01 X 10'* 


1 XIO^ 



Thus, it follows that in the example population (Table 3) of over a million 



constructs (equally represented in the DNA), a single construct expressing 10^ copies of RNA 
per DNA copy will increase to approximately 99% of the total expressed RNA sequences in 

S two passages. Using similar procedures in combination with drug and/or hormonal 

stimulation, and after consideration of the possible transcription factor binding sites within 
the sequence family (Figs. 5 & 6), it is within the intended scope of the invention to select for 
hormonal or pharmacological controls of transcription such as have been described herein. 
The factors contributing to the outcome are not only the input constructs, but recombinants 

10 and mutants as well. These secondary contributors to molecular diversity will be enhanced if 
multiple rounds of infections are allowed to occur, as oftentimes the difference between a 
particular transcription factor being able to bind (or not) may depend upon a single base 
change. Because viral infection is progressive and competitive, molecular evolution can be 
used to generate gene constructs de novo in the tissue culture dish in short time periods. 

1 5 Advantageously, the use primers to generate amplified fragments with uniquely 

complementary cohesive ends (i.e., that the ends will preferably only hybridize with the 
intended 5' and 3* fragments) to ligate three or more fragments as taught in this invention 
improves the potential for obtaining a diverse library. 

Although the examples particularly point out a transcriptional promoter as the 

20 product of the process, the skilled ardsan can appreciate that a particular selection technique 
can be applied to other cis- and /:rany-acting genetic sequences as well. Although a virus is 
used to propagate the selective advantage of a preferred embodiment, it can also be 
appreciated that any selective screen, such as drug selection, cell survival, phenotypic 
selection, cell sorting, antibody selection, and the like (see Ausuble et al., supra) could be 
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substituted without changing the intended scope of the invention. Likewise, transfection or - 
cell fusion could be used in place of viral infection. Furthermore, substitution of different 
viruses, retrotransposons, or functional groups are likewise within the intended scope of the 
invention. The described embodiments are to be considered only as illustrative and not 

S restrictive, and the scope of the invention is indicated by the claims rather than by the 
narrative des^ptioiL All references and publications, cited herein, are incorporated by 
reference into this disclosure. 

Like the embodiments detailed above, the method of library production is also 
conducive to assembly and transfer of genetic material directly into eukaryotic cells, saving 

10 the step of propagation in bacteria that is standard in bacteria. An advantage of direct transfer 
of the libraries of this invention to eukaryotic cells, including the exemplary retroviral vector 
producer cells, is that certain essential ci.y-acting structural features will be under positive 
selection (i.e., if they are not present, the molecule will be lost due to its non-functionality). 
As discussed above, it is often advantageous to eliminate bacterial and plasmid DNA 

1 5 sequences, endotoxin, and other bacterial contaminants by introducing the constructs directly 

into eukaryotic cells. 

In addition to providing a method for constructing complex DNA molecules 
efficiently (as in the examples of three piece and six piece constructs), the methods of this 
invention permit the assembly of constructs that are larger than those conventionally 

20 propagated in E. colu Examples of these types of vectors include adenovirus vectors, herpes 
simplex vectors and artificial minichromosomes. In order to insert genes into such vectors 
that are too large for conventional molecular cloning procedures, in the past it was often 
necessary to resort to in vivo recombination, wherein the genes of interest are cloned into a 
suitable vector and the flanking homologous regions are used to target the foreign genes to a 

25 homologous site within the larger viral or minichromosome vector. However, the methods of 
this invention permit PGR fragments of any size (up to the limits of PGR capability, 20-30 kb 
per fragment) to be joined togeth^. Thus, it is feasible to precisely construct adenovirus 
vectors by amplifying larger sequences, and combining them by ligation. For example, 
several sections of adenovirus (S-10 kb each) can be ligated using the methods of this 

30 invention, up to for example, about 37 kb, and then transformed directly into human cells. 
Only the correctly recombined vectors are capable of replicating. Hence, the DNA is 
autoselecting. A similar procedure is used for generating herpes virus vectors, which are 
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approximately 1 50 kb. The precision of the methods of this invention permit non-essential - 
viral genes to be more easily eliminated from the construct. After transfection into 
appropriate cells» the DNA replicates and virus particles are formed. 

Some special considerations apply to larger vectors, however. First, it is 

S desirable to use enzymes that do not cut within the large DNA fragments. To prevent 
excessive fragmentation of the DNA by internal sites, it is desirable to use enzymes that cut 
rarely or infrequently, such as CpG-containing enzymes recognizing six bases, or enzymes 
such as Saplj recognizing seven bases and digesting a three bp overhang (thus permitting up 
to 32 fragments to be joined in order). It is also desirable to avoid shearing the DNA once 

10 large segments have been joined by ligation. One method of avoiding shear is to add the 
transfection agent, such as Superfect'" reagent (dendrimers, Qiagen) or Lipofectamine*"" 
(liposomes. Life Technologies, Gaithersburg, MD) directly to the ligation reaction, and then 
add the cells to be transfected to the mixture. This, or a similar method avoids the need to 
physically move the ligated DNA, and thus prevents shearing. Another method is to add a 

IS DNA condensing reagent (dendrimers, polycations [such as polyethyleneamine] histones or 
liposomes) directly to the DNA ligation reaction, and then move the DNA by pipette after it 
has condensed (thus reducing shearing of the DNA). Once mside the cell, viral DNA can 
replicate (as in the examples of partially replication-competent adenovirus and herpes simplex 
virus vectors). 

20 Artificial minichromosomes have been under development for years. True 

artificial chromosomes require a centromere, at least one origin of DNA replication, and in 
the case of linear molecules, telomeric repeats at the chromosomal termini. In addition, to be 
very effective it is desirable to have a selectable marker gene, one or more therapeutic genes, 
and/or reporter genes. 

25 In reality, the use of minichromosomes has been delayed by the inability to 

effectively manipulate the larger DNA molecules in vitro. Yeast and bacterial artificial 
chromosomes have been used with little success in mammals, and the addition of telomeres to 
the ends of linear chromosomes is also a special problem, as there is no prokary otic host that 
can tolerate large linear DNA. The methods of this invention offers the opportunity to 

30 assemble human or mammalian minichromosomes in vitro^ by using large segments (10-30 
kb) of synthetic, gene-amplified DNA as ligation starting materials. For example, up to 32 
Sap\ fragments (up to 30 kb each, containing the essential cis- and /ran^-acting sequences). 
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or 5 12 shorter Hgal fragments can be combined using these methods. As with the other 
examples, several enzymes suitable for this invention (e.g., such as class IIS enzymes) can be 
combined (possibly with different termini lengths) to simplify the task. The methods of this 
invention also facilitate construction of telomeric repeats, because the constructs of this 

5 invention do not need to be circular. Thus, the methods of this invention can be used to make 
telomeres of any length, by adding additional segments onto the ends of molecules. One way 
to do this is using self assembling genes that employ a repeating overhang sequence (self- 
complementary molecule, such as AG-3' at one end, and CT-3* at the other end), permitting 
the telomeres to be lengthened to the extent desired by adding the required molar excess of 

10 the telomeric repeat-containing fragment. This technique gives the investigator some control 
over the relative length of the telomeres, although the self-complementarity indicates that 
many repeats will be lost due to self-ligalion. This can be alleviated by using higher starting 
concentrations of DNA to favor inter-molecular ligations over intra-molecular ligations (e.g., 
>20 |ig/ml starting concentmtion of DNA). 

15 A two fold molar excess of telomeric fragments gives approxunately twice the 

avemge length of telomere as a strictly 1 : 1 molar ratio of all fragments. By using a higher 
molar ratio of shorter telomeric repeats it is possible to give greater uniformity to the overall 
length of the molecules, which will vary from one terminus to the other. Thus, in addition to 
providing a way to build large molecules with precision, the methods of this invention 

20 provides for a way to control the telomere length (or potential Ufe-span) of the artificial 
chromosome. To prevent damage during handling, the minichromosome DNA can be 
condensed withpolycations, adenovirus particles, dendrimers, histones, or liposomes prior to 
transfection, similar to larger viral vectors. 

The methods of this invention can be used to create recombinant virus. One 

25 example of this is an adaiovirus vector self-assembling gene system. This system can 
include three parts: 1) vector: 2) helper virus; and 3) helper cells. The vector part is a self- 
assemblmg fragment set of at least three fragments comprising the essential cis-acting 
sequences (left and right inverted terminal repeats, which are the 103 bp at both ends of the 
genome that are required for replication [ITRs] and packaging sequences [Y, base pairs 194- 

30 3S8) and central 'baggage* area, comprising one or more self-assembling fragments including 
therapeutic genes, marker genes, and reporter genes. The baggage area is thus flanked by the 
cis-acting sequences in the vector. Because the synthetic oligonucleotide sequences 
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34 

comprising the 5* and 3* termini of the helper virus are not phosphorylated, they will not 
ligate together creating multimers. Thus, the Ad5 vector region will assemble only into 
monomers. The helper virus part comprises all Ad5 trans-acting genes except for the ElA and 
EIB genes. The helper virus part has no cis-acting sequences, and it is amplified in several 

5 sections. In this prefened embodiment, the virus is amplified using primers that exclude the 
ITRs, packaging region and El A&B genes. The helper virus is digested by Sapl digestion, 
creating seven uniquely terminated fragments comprising the trans-acting viral genome, with 
dephosphorylated, blunt 5* and 3* ends on the temiinating fragments. The primers are 
designed so as to amplify the internal virus sequences without changing them, except for the 

1 0 S* and 3- ends of the virus. The PCR-amplified fragments are digested with Sap 1 and are 
religated in their natural order after gel isolation and Qiagen column purification. The 5' end 
of the helper virus genome starts at 3.2 kb (in the ElA gene) so as not to overlap the vector 
sequences, which could otherwise cause replication competent adenovirus (RCA). Because 
the. 5' and 3' ends of the helper virus do not contain Sapl sites, they remain intact after 

IS digestion with Sap] . Because the synthetic oligonucleotide sequences comprising the S' and 
3' termini of the helper virus are not phosphorylated, they will not ligate. Thus, the AdS 
helper vurus genome assembles only into preferred monomers during ligation. 

In a preferred embodiment, non-essential genes are deleted from the AdS 
genome by means of the method of self-assembling genes. In another preferred embodiment, 

20 the helper vkus genome is approximately 30 kb after deletion of ElA, EIB and £3 gene 
sequences from the helper virus, and it is amplified as a single long firagment using the 
eLONGase Amplification System (Life Technologies or a similar strategy for creating long 
PGR fragments with high fidelity). It is not of great importance that occasional PGR errors 
may occur, because multiple copies of the AdS helper virus are transfected into target cells, 

2S thus providing trans-complementation. The helper cells are preferably 293 cells, a human 
kidney cell line expressing ElA and EIB genes (ATGC). The vector part and tiie helper virus 
part are combined in equimolar ratios after ligation has been performed separately on each 
fragment set The Superfect protocol (Qiagen) is used to transfect the vector part and the 
he.lper part mto the helper cells. The helper cells lyse, releasing high-titer adenovirus 

30 particles that are capable of infecting a variety of human cells. The resulting defective virus is 
incapable of forming RCA, and it transmits up to 34 kb of foreign genes in the baggage area. 
Unlike-conventional AdS vectors that require separate constructs for E. coli propagation of 
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insert genes, and recombination in vivo, the present vectors are relatively easy to make and 
provide a precise, safe alternative to first generation and second generation adenovirus 
vectors. 

Exemplary methods for producing self-assembling vectors and genes are 
5 provided below. Further, the Examples provide methods for producmg libraries of nucleic 
acid sequences using the methods of this invention. A number of nucleic acid sequences 
identified using the methods of this invention are described. The examples provided below 
are exemplary and not limiting. All references and publications provided herein are 
incorporated by reference into this disclosure. 

10 

Example 1 

Three-Piece Gene Self-Assembly with 100% efficiency 

Using 6 primers (SEQ ID NOS:24 and 63-67), three PGR fragments were amplified 

15 from templates VLMG (SEQ ID NO:22) and VLBPGN (SEQ ID NO: 1). PGR reactions were 
carried out using the hot start technique, according to the manufacturer's instructions (Perkin 
Elmer) using PJu DNA polymerase (Stratagene). To amplify specific portions of the above 
templates, each primer contained a class US enzyme site capable of digesting a unique 
overhanging end that was complementary to only one other terminus in the subsequent 

20 ligation. The class IIS enzymes used were Bpm\ and Eco S7I (the latter was used to copy a 
fragment that contained an internal EpmX site). The reactions were carried out as follows: 1) 
the lower reaction was assembled according to the protocol for PGR Gems (Perkin Elmer); 2) 
the lower reaction was heated to 80^G, 5 min, then cooled to 4^C for 5 min; 3) the upper 
reaction was prepared according to PGR Gems protocol and was added to the lower reaction 

25 (separated by cooled wax). The primer concentration was 0.3 fiM (final). The dNTP 

concentration was 200^M (final). 3 Units of ?fu polymerase was used. All fragments were 
amplified using the following conditions: 96*'G, 45 sec; (then followed by 30 cycles of the 
following) 96*'C 45 sec, 52*'C 45 sec, ITQ, 6 min; then followed by a smgle incubation at 
72^C for 10 min; then hold at 4°C. All fragments were successfully amplified. The PGR 

30 fragments were purified using the Qiaquick PGR purification protocol (Qiagen). The 
fragments were digested with an excess of the appropriate restriction en^rme {fipm\ or 
£co57I). The digested fragments were run on a 1% agarose gel and were excised using 
minimal irradiation from a hand-held 365 nm ultraviolet light. The firagments were purified 
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using the Qiagen Qiaquick Gel Purification Protocol. The fragments were ligated at an 
equimolar ratio at a concentration of >20^g/ml with T4 DNA ligase (Boehringer Mannheim) 
overnight at 4**C. Competent E, coli SCSI 1 0 cells (Stratagene) were transformed with the 
ligated DNA. Eight colonies were characterized by restriction enzyme analysis, and all eight 
5 contained the correct order and orientation of the three firagments. The experiment was 
repeated mdependently by another investigator, and the same result was obtained 
(8/8=1 00%). Thus, the procedure resulted in a high percentage of correctly assembled 
vectors. 

This three-piece vector ^ras VLABP. The deletion extended from the distal 
1 0 enhancer region to the TATA box near the start of transcription. The deletion region was a 
pair otBpmX sites that permitted US sequences to be cloned into the insert. 

One validated £. coli clone of VLABP was transfected into retroviral helper 
cells. After 48 h, the vector was transduced into amphotropic helper cells. After selection for 
two weeks with the drug G418, drug resistant colonies were grown up in a mass culture and 
15 the vector was transduced firom the amphotropic helper cells into a human HTl 080 cell line 
(ATCC, Rockville, MD). Surprisingly, even with a large deletion in the LTR promoter, the 
basal TATA box-contaming VLZfiP was transmitted as a retrovector and was permanently 
inserted into the human cell Una, thus establishing the validity of the self-assembly technique 
for the construction of functional eukaryotic vectors. 

20 

Example 2 

Production of a Six Piece Self-Assembling Expression Vector 

Due to the high efBciency of the gene self assembly process for the three piece 
25 assembly, a complex vector containing six firagments was constructed. The results here were 
extended to determine whether such a self-assembled vector would also have biological 
activity in human cells without being cloned and grown in a prokaryotic cell. 

Six fr^ments were individually constructed by PCR using three different 
templates and twelve primers (as illustrated m Fig.8). The primers used three different class 
30 IIS enzymes. The enzymes were chosen so as to give 2 base pair, 3 '-overhanging ends. Three 
enzymes were used in order to avoid the use of enzymes that had additional sites internal to 
the fragments being amplified Thus, Bpml was used imless there was an internal Bpm 1 site. 
If such a site existed, EcoSll was used. If there was also an internal EcoSTl site, then BsrD\ 
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was used. However, it is alternatively possible to use an enzyme such as Eaml 1 041 , where 
the Eaml 1041 sites in the primers are unmethylated (therefore susceptible to digestion by the 
enzyme), and wherein the "^dCTP analog of dCTP is used in the PGR reaction, methylating 
all internal sites (and protecting them from digestion by Eaml 1041), as suggested by Padgett 

S and Sorge, 1996, supra, and incorporated herein by reference. 

Using 12 primers, 6 fi:agments were amplified firom 3 templates: pBK-CMV 
(SEQ ID NO:26) , pVLMB (SEQ ID NO:23) and pVLOVhGH-900 (SEQ ID N0:21). 
Fragment 1 was amplified from pBK-CMV using primers 1 and 2 (SEQ ID N0S:3 1 and 32). 
Fragment 2 was amplified from pVLMB using primers 3 and 4 (SEQ ID NOS:33 and 34). 

10 Fragment 3 was amplified from pVLOVhGH-900 using primers 5 and 6 (SEQ ID NOS:35 
and 36). Fragment 4 was amplified from pVLMB using primers 7 and 8 (SEQ ID NOS:37 
and 38). Fragment 5 was amplified from pVLMB using primers 9 and 10 (SEQ ID NOS:39 
and 40). Fragment 6 was amplified from pVLMB using primers 1 1 and 12 (SEQ ID N0S:41 
and 42). PGR reactions were carried out using the hot start technique, according to the 

1 5 manufacturer's instructions (Perkin Ehner Ampliwax PGR GEMS 1 00). The lower reaction 
was heated to 80 ^ C for 5 min, then cooled to 20 ''G for 5 min. The upper reaction was 
prepared according to PGR gems protocol and was added to the lower reaction (separated by 
cooled wax). The primer concentration was 0.3 micromolar (final). The dNTP concentration 
was 200 \itA (final). S U of polymerase (Stratagene) was used per reaction. 1 00 ng of 

20 template was used for each reaction 14 rounds of PGR amplification were used to reduce 
mutagenesis of the templates. The PGR cycling protocol was 96 ""C 45 sec; then two cycles 
of (96^G 45 sec, 52°C 45 sec, 72^C 6 rain); then 12 cycles of (96X 45 sec, 58°G 45 sec, 
72*=^G 6 min) followed by a 72** G soak for 10 min, then to 4**G hold. 

The six PGR fi:agments were designed to self-assemble into a retro-vector after 

25 digestion with the correct class IIS restriction enzyme (Fig. 8). After transfection into 
retroviral helper ceUs, the vector DNA is transcribed as RNA by means of the 
cytomegalovirus inunediate early promoter (firagment 1). This promoter replaces the 
retroviral or VL30 LTR in this vector. The RNA transcript region begins with the R and U5 
regions of the Moloney murine leukemia virus (MoMLV) LTR, the viral packaging signals 

30 QV) region of MoMLV, the packaging enhancer (4^+ ) region of mouse VL30 and the IRES 
region of EMGV firagment 2. Fragment 3 consists of the human growth hormone (hGH) 
cDNA sequence. Fragment 4 consists of the SV40 virus early region promoter driving 
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expression of the neomycin phosphotransferase (neo) gene. Fragment five consists of the (+)- 
strand primer binding site of the MoMLV LTR, the U3 region of the MoMLV LTR, the 
repeat (or R) region, and a portion of the US region. Fragment 6 consists of the PBR322 
plasmid origin of replication. 



Fragment 1: CMV early region promoter 

Template: pBK-CMV plasmid DNA (Stratagene, LaJolla, CA) Bpml (SEQ ID 

NO:26) 

PCR primer 1 (SEQ ID N0:31) 
1 0 GACTAACCTTGATTCCACTGGAGCCGTATTACCGCCATGCATTAGTTATTAATAG 

PCR primer 2 (SEQ ID NO:32) 
GACTAACCrTGATTCCACTGGAGTAATTGCGGCTAGCGGATCTGACG 

Fragment 2: R-U5-Psi-Psi(+)-IRES Bpml 
1 S Template: pVLMB plasmid DNA (SEQ ID NO:23) 

PCR primer 3: SEQ ID NO:33 
GACTAACCTTGATTCCACTGGAGACACTTGACCTCTACCGCGCCAGTCCTCCGAT 

TGACTGAGTCG 

PCR primea: 4: SEQ ID NO:34 
20 GACTAACCTTGATTCCACTGGAGGGATCCGCGCCCATGATTATTATCG 

Fragment 3; bimian growth hormone (hGH) Bsr Dl 

Template: pVLCNOVhGH plasmid DNA (SEQ ID N0:21) 
PCR primer 5: SEQ ID NO:35 

GACTAACCTTGATTCCAGCAATGTCGGTTAGCTTGTTTCTTTACTGTTTGTC 

25 PCR primer 6: SEQ ID NO:36 

GACTAACCTTGATTCCAGCAATGTTAGGACAAGGCTGGTGGGCACTGG 

Fragment 4: SV40 early promoter-neomycin phosphotransferase 

Template: VLMB plasmid (SEQ ID NO:23) 
30 PCR primer 7: SEQ ID NO:37 

GACTAACCTTGATTCCACTGGAGGGTCGACCCTGTGGAATGTGTGTCAG 
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PGR primer 8: SEQ ID NO:38 
GACTAACCTTGATTCCACTGGAGAATCTCGTGATGGCAGGTTGGGCGT 



Fragment 5: MLV(+)PBS-U3-R-U5 
5 Template: VLMB plasmid (SEQ ID NO:23) 

PGR primer 9: SEQ ID NO:39 
GACTAAGGTTGATTGGACTGAAGAGATTTTATTTAGTGTGGAGAAAAAGGGGGG 

PGR primer 10: SEQ ID NO:40 
GACTAAGCTTGATTGCACTGAAGCCCCCAAATGAAAGAGCCCCGCTGAGG 

10 

Fragment 6: PBR322 origin of replication 

Template: VLMB plasmid (SEQ ID NO:23) 

PGR primer 11: SEQ IDN0:41 
GAGTAAGCTTGATTGGACTGGAGCCGGGAGGGAATTGGTAATGTGGTGC 
15 PGR primer 12: SEQ ID NO:42 

GAGTAAGGTTGATTGCACTGGAGTTGTCGAGGCGGCGGATGTCGGCG 



Procedure: The twelve primers were prepared by the following procedure: 1) 
oligonucleotides were synthesized with trityls off. After deprotection and lyophilization, the 

20 samples were resuspended in S microliters deionized formamide and loaded onto a 

polyacrylamide gel (12% polyacrylamide, 250V). The samples were excised under short 
wave UV irradiation and eluted overnight in 600 microliters of sample eiution buffer (0.5 M 
ammonium acetate, 10 mM Mg acetate, 1 mM EDTA, 0. 1% SDS). The contents were loaded 
onto a BioRad Ghromatography column (Cat # 732-6008) and centrifiiged into an Eppendorf 

25 tube at low speed (2000 RFM, 5 min). After washing the column vith 500 microliters TE 
buffer (10 mM Tris, 1 mM EDTA), pH 8.0 and recentrifiigation (2000 RPM, 5 min), the 
pooled eluate was etbanol precipitated, washed with 100% ethanol, resuspended in T£ buffer 
and quantitated by ^ectrophotometry of a small sample, which was then discarded. 

« 

Fragments were cleaned using the Qiaquick PCR cleanup procedure. The 
30 fragments were digested with their respective class IIS restriction enzyme. The digested 
fragments were run on 1% agarose gels, and the fragments were excised and cleaned using 
the Qiaquick gel cleanup procedure. Fragments were combined in an equimolar mixture and 
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ligated overnight at 4° C with T4 ligase and ATP. An analytical gel was run with the ligated 
DNA, as well as with controls including unligated fragments and ligated fragments with a 
single fragment missing. As opposed to the controls, the complete ligation included bands 
equivalent to the fiiU-length supercoiled monomer (refered to as GENSA 981, SEQ ID 

S NO:29), as well as bands possibly representing multimers (up to six bands were observed). 

In order to assess the efficiency of the method, eleven nanograms of DNA 
were transfected into SCSI supercompetent cells. Thirteen kanamycin resistant colonies 
were harvested, and plasmid DNA preps indicated 10 out of thirteen that appeared to be the 
correct length. All ten gave the expected bands when digested with Pstl^SndBly and Bam 

10 HI. 1 .35 Jig of the ligated DNA was purified by phenol-chloroform-isoamyl alcohol 
extraction, followed by two extractions with chlorofoim-isoamyl alcohol, and was 
precipitated in ethanol. The DNA was washed in 70% ethanol and re-suspended in 50 |il of 
sterile phosphate buffered saline (for transfection). The DNA was transfected (using the 
Qiagen Superfect protocol) into HTaml (amphotropic human helper cells). 24 h after 

1 5 transfection, the target cells were Avashed and fresh culture media was added. 48 h after 
transfection, the supernatant from the vector producer cells was filtered (0.45 ^m, Nalgene) 
and transferred to PG13 helper cells (ATCC) and HT1080 human fibrosarcoma cells. This 
procedure was repeated after 72 h. 48 h after transduction, recipient cells were started on 
0418 drug selection (500 ^g/ml). The appearanceof G418 drug*resistant colonies on 

20 transduced PGI 3 and HT 1 080 cells after 6 days of selection indicated successftil 

transmission via retrovirus particles. The transfect HTam cells were also selected with G41 8. 
After six days of drug treatment, 45 colonies of resistant cells were counted. Thus, the six 
fragment gene assembly was effectively transmitted and expressed as either a DNA 
(transfection) vector or a retro-vector. 

25 

Example 3 

Design and Construction of Single LTR Vectors 

Backgroimd: In order to manipulate the interior of the VL30 LTR sequences using a 
30 promoter rescue technique, single LTR vectors were constructed. The mouse VL30 element 
NVL-3 was used as tfie starting nmterial as it is constitutively and abundantly expressed in 
most mouse tissues. Single LTR vectors are circular and behave as if they contained two 
LTRs. Thiis, in these vectors RNA transcription begins at the start of the R region (see Fig. 
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3B), and continues through the polyadenylation site after completing the second round of 
transcription of the R sequences (Fig. 3 A). In previous studies, these vectors were expressed 
transiently in vector producer cells and the DNA did not integrate into cell DNA as a standard 
two LTR vector. Therefore, the vectors were usually passed to a second complementation 
5 helper cell line via retroviral transduction of the vector RNA transcribed in the first helper 
cell This process resulted in the vector regenerating a correct (two LTR) structure upon 
integration into the recipient cell DNA. 

Experimental method: The plasmid pNVL-3 (SEQ ID NO:25, kindly provided by Dr. J. 

10 Nortonm Manchester, UK), containing a complete copy of the NVL-3 (mouse VL30) genome 
(Adams et al, 1989), was digested with AZiol (which cuts in the LTRs), releasing the 4.27 kb 
VL30 genome with one copy of the LTR. This fragment was circularized using T4 DNA 
ligase and ATP. The circular DNA was linearized by digestion with SnaBl^ 1 87 bp from the 
3'-LTR. A 2.3 kb fragment containing the SV40 virus early region promoter and the 

1 5 aminoglycoside phosphotransferase {neo) gene, together with the PBR322 plasmid origin of 
replication, was excised firom the BAG retrovirus vector (Price et al.y Proc. Natl. Acad ScL 
84:156-160, 1987, kindly provided by C. Cepko, Cambridge, MA). BAG is also obtainable 
in a retrovirus helper cell line from American Type Culture Collection (ATCC), Rockville, 
MD by digestion with A%ol and BamlU. This fragment was blunted with T4 DNA 

20 polymerase and dephosphorylated with calf intestinal alkaline phosphatase (CIP). The 

fragment was then ligated to the linearized SnaBl firagment of NVL-3. The resulting plasmid 
(containing a circularly permuted NVL-3 genome with the SW -neo-ori region) was designated 
VLSN02 (SEQ ID NO:30). 

In order to facilitate the switching of LTR sequences by means of the class IIS 

25 enzyme Bpml, VLSN02 was digested with Bpml (six sites). The region containing four 
Bpml sites was removed and replaced with a 19 bp linker (SEQ ID NOB: I and 52, see 
below), 921 bp beyond the LTR. The linker contained Sm BI, Clal and Bam HI cloning 
sites. 

Linker (top strand): 5'.TACGTATCGATGGATCCGA-3' (SEQ ID N0:51) 
30 Linker (bottom strand): 5'-GGATCCATCGATACGTAAG-3' (SEQ ID NO:52) 
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The remaining two of the Bpm\ sites had complementary ends, which 
permitted their ligation and resulted in eradication of all Bpml sites within the resulting 
vector VLSN03 (SEQ ID NO:20). 

In order to facilitate reporter/therapeutic gene function, a 3.7 kb fragment 
containing the internal ribosome entry site (IRES) from encephalocyocarditis virus, together 
with the P-galactosidase reporter gene, was excised from the plasmid pVLSAIB AG (kindly 
provided by Mr* James Grunkemeyer, Omaha, N£) by means of a partial digestion of the 
plasmid with Bam HI. This region was inserted into the Bam HI site of VLSN03, resulting in 
the vector VLSNOSIB (SEQ ID N0:14). 

A second reporter construct, pVLSNQG (5774 bp, SEQ ID NO: 19) contained 
the green fluorescent protein (GFP, Clontech. Palo Alto, CA) gene was constructed by 
inserting a fragment (800 bp) from plasmid pGFP-Nl. This sequence, containing 

the GFP gene, was treated with mung bean exonuclease and inserted into the unique Sna B 1 
site of pVLSN03. 

In order to enhance GFP fluorescence from the reporter plasmid pVLSNOG, 
the serine-6S codon in the GFP gene was mutated into threonine by a site-directed 
mutagenesis procedure with the Transformet^ Site-Directed Mutagenesis kit from Clontech. 
A Bpml site in the GFP gene (threonine-9) was mutated at the same time without changing 
the amino acid (ACT to ACA). The resulting plasmid was pVLSNOGM (SEQ ID NO: 18), 

AniVcol-ATiol fragment (585 bp) from plasmid pGlIL2EN (kindly provided 
by Dr. Steven Rosenberg, Bethesda, MD), containing the internal ribosome entry site (IRES) 
from encephalomyocarditis vurus (EMCV) was inserted into HiQApal site upstream of the 
GFP gene in pVLSNOGM, resulting in pVLSNOGMI (SEQ ID NO: 17). Both insert and 
plasmid fragments were blimted with mung bean exonuclease. One variant version of 
pVLSNOGMI with an IRES tandem dimer was also constructed and designated 
pVLSN0GMI2 (SEQ ID N0:16). 

Oligonucleotides (SEQ ID N0:S3 and 54) containing a splice acceptor (SA) of 
AKV virus (in bold) was faiserted into pVLSNOGMI at the unique Sac 2 site just before the 
IRES, resulting m pVLSNOGMIS (SEQ ID N0:15). 
OUgo:(SEQIDNO:53) 

5 ' -GGCCGCTAACTAAXAGCCCATTCTCCAAGGTACGTAGC-3 • 
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3 • -CGCCGGCGATTGATTATCGGGTAAGAGGTTCCATGCAT-5 ' 
(SEQ ID NO:54, bottom Oligo) 

Recovery of LTR promoter sequences from mouse CD4+ T-helper cells 

5 In order to facilitate the recovery of VL30 promoter sequences expressed in 

mouse T-helper cells, a mouse CD4+ T-helper cell cDN A library (Stratagene, San Diego, C A, 
Catalog # 9373 1 1) was screened by plaque hybridization. Approximately 2x10^ 
bacteriophage A,-ZAP clones were plated on a lawn of £. coli cells according to the 
manufacturer's instructions. Two nylon filters were sequentially layered onto the lawn of £. 

1 0 coli cells and bacteriophage. The filters were hybridized to a ^^P-labelled (Prime-It RmT 
Random Primer Labeling Kit, Stratagene), 4.2 kb internal Xhol fragment of NVL-3 
(containing the NVL-3 genome). 55 plaques (or approximately 0.3% of the total phage) 
reacted positively on both filters. 1 8 VL30 cDNA sequences were cloned from the plate, 
which was used to identify U3 promoters that are actively expressed in the RNA of mouse T- 

1 5 cells. Five of the 1 8 clones contained intact U3 sequences, representing four of one 
molecular species, named THl (SEQ ID NO: 2) and one of another species, named TH2 
(SEQ ID NO: 3) also provided in Fig. S. THl contained ^proxhnately 120 bp more DNA 
than did TH2. Because THl was more abundant (4 out of S clones), the additional sequences 
in the enhancer region were implicated to be a possible reason for the stronger expression in 

20 mouse T cells. Examination of the known and putative transcription factor binding sites in 
the VL30 LTR (Hodgson, 1996, chapter 4, Fig. 4.2 supra) revealed several interesting 
features of THl and TIC. First, the extra sequences of THl that were missing in TBI 
included an extra copy of the enhancer repeat region as well as a potential retinoid 
(RARTRXR) binding site. Several transcription factor binding sites in the enhancer repeat 

25 region that differed between the two elements included: a cyclic 3'-5*AMP response element 
(VLCRE, a potential CREB/jun bindmg site), a serum response element (SRE), and a 
potential NF1/IL6 binding site (although there were additional sites for these factors in other 
enhancer repeats). These factors could possibly explain why VLTHl appeared to be 
expressed at higher levels, both in the source cells and into transduced cells. Together, the 

30 VL30 sequences represented 0.3% of the mRNA expressed in the T cells, and THl appeared 
to be most abundant VL30. 
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Sequencing Primers: 

(SK, SEQ ID NO:49) 5'-CGCTCTAGAACTAGTGGATC (20 mers, Tm 

(T7, SEQ ID NO:50) 5*-GTAATACGACTCACTATAGGG (21 mers, Tm 60°C). 

5 Seamless Rescue of T cell promoters using class IIS restriction enzymes 

Two sets of primers containing offset Bpm\ restriction sites were designed and 
synthesized. One set was for amplification of the plasmid sequences, and another was for the 
amplification of the inserts. 

10 Insert Primers: (Bpml site bold) 

ITA (43 mer, Tm: 67.2 "C, SEQ ID NO:45) 

CGATCCACTGGAGCTCGGAGCCCACCCCCTCCCATCTAGAGGT 



1 5 ira (43 mers, Tm: 66.3 SEQ ID NO:46) 

CGTCCTCCTGGAGAGCACAGGGTAGAGGAGTCTCGACGGTCAG 

Vector primers: (Bpml site bold) 

VLA (43 mers, Tm: 68.2 SEQ ID NO:47) 

CGCAACCCTGGAGACCTCTAGATGGGAGGGGGTGGGCTCCGAG 

20 VLB (43 mers, Tm: 66.3 "C, SEQ ID NO:48) 

GCAGGACCTGGAGCTGACCGTCGAGACTCCTCTACCCTGTGCT 

To amplify vector sequences more efficiently, vector templates were shortened 

by deleting marker genes firom vectors. pVLSNOSIB (SEQ ID NO: 14) was cut with Kpn 1 

and a 4201 bp firagment containing P-gal gene was removed. The remaining vector has 3923 

25 bp. 

The U3-promoter inserts (357 bp for THl and 240 bp for TH2) were PCR- 
amplified from THl and TH2 promoters with primers ITA and 1TB. The vector cassettes 
(--4.2 kb for pVLSNOSIB and -3,7 kb for pVLSNOGMIS) were PCR-amplified fi^om the 
shortened vector templates using primers VLA and VLB, {stpra). The PCR-amplification 
30 was done with high-fidelity P/u DNA polymerase from Stratagene (La JoUa, CA). The 
amplified products were gel-purified (1% agarose gel). The inserts were then cut with Bpm 1 
to produce complementary ends. The vector cassette products were phosphorylated with 
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PNK, then circularized with T4 ligase, and transformed into SCS 110 cells. Recovered 
plasmids were then digested with Bpm 1 and treated with CIP to produce complementary 
ends. Bpm 1 treated inserts and vector cassettes were ligated, and T-cell tissue-specific VL 30 
vectors VLTHl and VLTH2 were produced. The marlcer p-gal gene and GFP gene were put 
S back into those vectors at the original unique sites A!pn 1 and Sal 1 respectively. 

Transndission and expression of single LTR vectors and T cell U3 sequences 

Vector DNA constructs were transfected into GP+E86 retroviral helper cells 
(Markowitz et al» 1988, supra) using the Lipofectamine protocol (Life Technologies, 

10 Gaithersburg, MD). The culture media firom these cells (supernatant), containing defective 
transducing particles (72 h post-transfection), was transmitted to PA317 (Miller, US Patent, 
cited supra) amphotropic helper cells, using Lipofectamine to enhance transduction efficiency 
(Hodgson et al.^ 1996. Synthetic Retrotransposon Vectors and Gene Targeting pp. 3-14, in : 
Feigner et al., eds. Artificial Self-Assembling Systems for Gene Delivery. American Chemical 

IS Soc. Books, Washington, D.C.). A similar procedure was used to transmit VLTHl and 
VLTH2 to the PG13 helper cell line (Miller e/<2/„ 1991. J. Virol. 65:2220-2224). 24 h post- 
transfection, the recipient cells were selected with the drug G41 8 (SOOfig/ml, 2 weeks) to 
enrich for stably transduced cell populations. 

All of the single LTR vectors, including VLTHl and VLTH2 were transmitted 

20 by this method, indicating that single LTR vectors can be used for promoter switching and 
yet revert to dual LTR vectors after a single passage. Vectors VLSN02, VLSN03, and 
VLSNOSIB were then titered on NIH 3T3 cells (using the PAS 17 vector producer cell Imes). 
VLTHl and VLTH2 vectors were titered on human HT1080 cells (PG13 cell lines). 
Surprisingly, all of the single LTR vectors were transnutted effectively. However the titers of 

25 stably transduced THl and TH2 cell lines were 5.5 x 10^-1.1 x 10^ TU/ml, compared to 0.4- 
3.0 X 10' TU/ml for the VLSN02, VLSN03 and VLSNOSIB cell lines. Thus, switching 
firom the NVL-3 transcriptional promoter (originally isolated firom NIH 3T3 fibroblast cells) 
to VL30 promoters derived firom T helper cells, appeared to have a negative effect on RNA 
expression in fibroblast cells, as determined by the transmissibility of the RNA. 

30 In order to study the usefiilness of rescued promoters as DNA transfection 

vectors (as opposed to retro-vectors), VLSNOSIB, VLTHl and VLTH2 were also transfected 
into a number of cell lines (using Lipofectamine), including NIH 3T3, PA317, GP+E86, 
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PG13, HT1080, SW480 and HeLa (available from ATCC). RNA expression in these cell 
lines is shown in Table 4, wherein gene expression from the LTR promoter (as determined by 
P-gal staining) is normalized to VLSNOSIB (100). 



Cell line: 


NIH 
3T3 


PA317 GP+E86 


PG13 


HT1080 


SW480 


HeLa 


Vector: 














VLSNOSIB 


100 


100 100 


100 


100 


100 


100 


VLTH1 


39.3 


18.7 0.1 


21 


25.5 


156 


156 


VLTH2 


28.6 


7.1 5.5 


11.5 


46.8 


82 


156 



5 

Table 4. Transient expression of a P-gal marker gene by three VL30 promoters: NVL- 
3 (VLSNOSIB), VLTHl and VLTH2. Cells were transfected usmg the Lipofectamine 
procedure. Total blue cells were counted from each well in 6-well plates, and the number of 
blue cells from VLSNOSIB was normalized to 100%. 

10 

The expression of both the VLTHl and VLTH2 promoters was significantly 
reduced compared to VLSNOSIB in cell lines of fibroblastic origin, whereas in SW480 
colorectal cancer cells and HeLa cells, it was comparable to or better than VLSNOSIB (the 

15 NVL-3 promoter). However, VLSNOSIB was expressed poorly in the non-fibroblastic cell 
lines, so a direct comparison was difCcult to interpret. Unfortunately, the human T cell lines 
(Jurkat and M0LT4 [obtained from ATCC]) were not transfected by Lipofectamine, and they 
were poorly transduced by VLTHl and VLTH2 retro-vectors. In the Juricat and M0LT4 cells 
transduced with VLTHl and VLTH2, only a small percentage (1-10%) of cells that were 

20 stably transduced by the vectors stained positively for P-gal expression. However, the marker 
gene (neo) continued to be e^qjressed from an internal promoter, as evidenced by drug 
selection. 

Taken together, the results demonstrated: 1) the ability of the promoter rescue 
technique to seamlessly capture frtnctional transcriptional promoters from specialized cells; 2) 
25 the ability of single LTR vectors to introduce the rescued promoters into standard transducing 
vectors; 3) the ability of the rescued promoters to be expressed at differing levels in several 
different cell types, including T cells; and 4) screening and selection established the efiBcacy, 
or lack thereof, of individual promoter sequences. 

Although the general method of promoter rescue was demonstrated by the 
30 foregoing experiments, the titers obtained from the sLTR VL30 vectors may not be useful 
where selection systems are not available. 
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Additional experimentation led to the development of a chimeric packaging 
signal, combining the essential packaging signal from Moloney murine leukemia virus (40, 
and the enhanced packaging signal (4^+) from a mouse VL30 element. A vector embodiment 
of this packaging system is VLMB (SEQ ID NO:23). One advantage of the chimeric 
5 packaging system was the elimination of retroviral gag gene sequences that were present in 
previous high-titer MLV-based vectors (viral gag sequences contribute to the generation of 
replication competent retrovirus outbreaks). The titers of VLMB-based vectors ranged from 
q)proximateiy 1 x 10^ to 4 x 10*TU/mL 

10 Construction of a cloning vector for promoter rescue 

Using pVLSNOGMIS as a template, and primers (SEQ ID NOS:28 and 68), a 
6.4 kb plasmid fragment was PCR amplified (Using Hot Start Ampliwax PCR Gems 100, 
Perkin Elmer). 30 cycles of PCR were performed by following the manufacturer's 
instructions, with the following input conditions: lower reaction, 80° C, 5 min., then add 

IS upper reaction and template, 96° C, 1 min. Each reaction vial contained 50 ng template, O.S 
iiM each primer, 200 \iM dNTPs and 5U (2^1) Pfii polymerase (Stratagene, LaJolla, CA). 30 
repeating cycles of: 96** C, 45^ sec; 50* C, 45 sec; 75 C, 1 mm. A final incubation of 75** C, 
10 min, then hold at 4° C. After amplification, the reactions were purified using Qiaquick 
PCR Purification Kits (Qiagen). The PCR products were digested with Pad, heat inactivated 

20 (65"* C, 20 min) and ligated together using T4 DNA ligase (overnight at 4"" C in a 5 ^1 vol). 
The ligated DNA was transfected into SCSI 10 K coli cells (Stratagene) with kanamycin (50 
^g/ml) antibiotic added to the agar plates. The cells were dcm\ dam (to prevent methylation 
of Bpm\ sites). The resulting plasmid, pVLBPGN (SEQ ID N0:1, Figs 2 &3) has a deletion 
in the U3 region of the LTR. A linker containing a central PacX site flanked by two 

25 outwardly-digesting Bpm 1 sites occupies the site of the deleted U3 sequences. The Bpml 
sites enable the plasmid to be digested with Bpml, resulting in two 2 bp 3 '-overhanging ends 
that are complementary to the U3-derived RT-PCR inserts described below. The digested 
plasmid was purified fi:ee from the intervening linker sequences from an agarose gel after 
digestion with BpmXy using the Qiaquick gel purification kit (Qiagen). 

30 
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Procedure for amplification of liver U3 promoter region 

Purified mouse liver total tissue RNA was purchased from Ambion, Inc., 
(Austin, TX). Total liver RNA was treated with RQl Rnase-free (Promega, Madison, WI). 
Using Perkin Elmer Gene Amp thermostable rTth reverse transcriptase RNA PCR kit (P/N 

5 N808-0069), the following conditions for RT-PCR were used: RT-PCR A 70** (hot start); RT- 
PCR B, 95**C, 60 sec, then 35 cycles (95*m 10 sec, SS'^C, 15 sec) then a final SS^'C incubation 
for 7 min, then 4^C and hold. Additional conditions were: primer concentration 0.15 
micromolar, template 100 ng/reaction, dNTPs 200 micromolar (final) and MgCI^ 3.5 
mM(final). The primers for insert amplification were SEQ ID NOS:28 and 68) 

1 0 The amplified U3 sequences were purified using Qiaquick. The pVLBPGN 

plasmid was digested with Bpml^ isolated firom a 1% agarose gel and purified using the 
Qiaquick method. The purified U3 sequences were ligated.at 1 :2, 1 ;4 and 1 :6 molar ratios of 
VLBPGN plasmid:insert using T4 DNA ligase and a 5 microliter reaction volume overnight 
at (1 00 ng plasmid: 1 6 ng insert =1:1 molar ratio). 1 microliter of each ligation reaction 

15 was transformed into £. coU SCS 110 competent cells (Stratagene). 26 colonies were 

recovered in total. Out of 23 clones grown overnight in the presence of kanamycin, 20 had 
sequences that appeared to be mouse VL30 sequences, representing 1 0 diflferent VL30 
species (Fig. 6, SEQ ID NOS: 4-13). One of these (Hep 10, SEQ ID NO: 13) was transiently 
transfected into Hep G2 liver hepatocellular carcinoma cells. 48 h after transfection, intense 

20 GFP fluorescence was observed, indicating strong expression of the Hep 10 U3 promoter 
region. 



25 



Example 4 

Creating a combinatorial library of mouse VL30 U3 sub-regions. 



Using Fig. 7 and Hodgson, 1996, supra. Fig. 4.2 as a guide, the following three sub- 
regions of the VL30 U3 region were empirically established: Distal (1); medial (2); and 
proximal (3). Peaks of similarity were used to guide the following choice of primers: (+) 
primer binding site-5'-LTR boundary; -80 bp (defines sub-region 1); -80-210 bp (sub-region 
30 2); -210-430 (sub-region 3). The following primers were selected to amplify the vector 
VLBPGN or a similar VL30, NVL-3 LTR-containing vector: 

PI (going left from the 5'-end of the LTR to amplify the plasmid) 
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(SEQ ID NO:55) 

GACTAACCTTGATTCCACTGGAGTTTT(CT)(CT)ATTCTTCATO 
TTCTT 

P2 (going right from the 3*-end of the promoter region to amplify the plasmid) 
(SEQ ID NO:56) 

GACTAACCTTGATTCCACTGGAGAATCTGGACCAATTCTATATAAGCCTG 
TGAAAAATTT 

The six primers selected to amplify the inserts are as follows: 

Fragment 1, primer 1 (going right from the LTR terminus into U3) (SEQ ID NO:57) 

GACTAACCTTGATTCCACTGGAGAAGAAGAAGTGGGGAATGAAGAA 

Fragment 1, primer 2 (going left from the end of fragment 1) (SEQ ID NO:58) 

GACTAACCTrGATTCCACTGGAGATCTCTAGATGGGAGGGG(GT)(CT)GGG 

CTC 

Fragment 2, primer 1 (going right from the left end of fragment 2) (SEQ ID NO:59) 

GACTAACCTTGATTCCACTGGAGCTCGGAGCCCACCCCCTCCCATCT 

Fragment 2, primer 2 (going left from the right end of fragment 2) (SEQ ID NO:60) 

GACTAACCTTGATTCCACTGGAGGGAGGCCCTTATCTCAAAAATGTT 

Fragment 3, primer 1 (going right from the left end of fragment 3) (SEQ ID N0:61) 

GACTAACCTTGATTCCACTGGAGTCTAAGAACATTTTTGAGAT^ 

T 

Fragment 3, primer 2 (going left from the right end of fragment 3) (SEQ ID NO:62) 
GACrAACCTTGATTCCACTGGAGTCACAGGCTTATATAG(TG)AAA 

100 ng of genomic DNA from Mus musculus is used as a template (the mouse genome 
bears 100-200 copies of VL30 elements). Standard PGR procedures for P/u polymerase are 
used. Fragments are amplified 35 rounds of PGR to obtain single-copy genomic DNA 
amplification. Samples of Qiagen column purified DNA are examined on analytical agarose 
gels to determine the approximate size. The remainder of each reaction is digested with the 
appropriate enzyme and run on an acrylamide or agarose gel. The digested fragments are 
purified by standard gel purification procedures and are ligated to the plasmid fragment at an 
equimolar ratio of the four PGR fragments (three mserts and one plasmid). The ligation mix 
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is transformed into E. coli SCSI and is grown on kanamycin. The number of colonies is used 
to establish the size of the combinatorial library, and the pooled colonies are grown in £ coli 
and the DNA is harvested en masse. A dozen or more colonies are characterized by DNA 
sequencing to determine the approximate fidelity of the reaction. A library of 1,000 or more, 
5 but preferably 100,000 or more members is used for combmatorial screening procedures. 

Screening the combinatorial libraries for expression in specific cell types using a 
replication defective helper virus 

The U3 library DNA is transfected into the desired target cells in which 

10 expression is desired. Along with the library, approximately 25% of the total DNA should 
include retroviral helper sequences. The latter sequences can be a helper plasmid (such as 
pPAM3, Miller et a/., US Patent 4,861,719). The virus is amphotropic, permitting it to infect 
most human cells. The RNA fix)m individual clones that are transcribed in the target cells will 
be packaged into retrovhal vuions made by the helper virus, and the virions can be harvested 

IS as the cell firee filtrate (0.45 mm) from the vector producer cells. This virus (containing the 
expressed sequences) can be transmitted to fresh target cells that do not contain helper virus. 
48 hours after passage, the DNA form of the transcriptionally active clones will be integrated 
in the recipient cells, and these transcriptionally active loci will produce more RNA, and 
protein. After G418 drug selection to increase the proportion of cells expressing the vector 

20 sequences, helper virus DNA is again transfected into the recipient cells, transforming them 
into vector producer cells. The virus from these cells should contain increased amounts of the 
RNA from clones that are transcriptionally active in those cells. Passage of the virus is 
continued for two or three rounds to permit recombination and mutation to take place, 
enhancmg the effect of in vitro evolution of promoters. The actual degree of enhancement 

25 attainable at each step is illustrated in Table 2 (supra). After several passages, the actual 
level of RNA expressed by several clones is determined by RNA blotting, or by the amount 
of a reporter gene expressed as protein (determined visually or by the appropriate assay). 
Because human cells do not naturally contain VL30 DNA or RNA, the sequences that remain 

r 

in the human cells are those with the most transcriptionally active promoters. These 
30 sequences can be amplified and re-cloned usmg the methods of the instant invention, or they 
can be rescued by virus packaging, reverse transcribed by the endogenous reverse 
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transcriptase reaction, and grown as plasmids (due to their piasmid origin of replication and 
the selectable kanamycin marker gene). 

In addition to using a replication defective helper virus, such as the clone 
pPAM3, it is also possible to use a replication competent retrovirus, such as Moloney murine 

5 leukemia virus to passage the library. For use in human cells, however, the virus should have 
a tropism that is compatible with human cells (gibbon ape leukemia virus and amphotropic 
[4070A1 murine retrovuuses are acceptable). 

In addition to being useful for generating active transcriptional promoters de 
novoy a small variation on the above procedures may enable the isolation of hormone 

10 responsive promoters. In it, the cells are treated with the hormone (which could be a steroid, 
a peptide hormone known to affect the cells, a drug, a drug agonist or antagonist, etc.) during 
passage. After isolation of surviving VL30 vector-containing cells, individual clones of drug 
resistant cells are tested for reporter gene expression with and without drug treatment to 
determine relative protein expression. Likewise, RNA expression can be determined by blot 

1 5 analysis or a similar method. A useful list of known VL30 responses to' pharmacological 

agents is listed in Fig. 4.2 of Hodgson, 1996, supra, and can be used as a guide to help assess 
the potential agents known to have an effect on VL30 transcription. 

Once the transcriptional promoters with the known specificity have been 
obtained, they can be used to obtain expression of genes fh)m a variety of types of vectors. 

20 For example, in addition to retrovirus particles, the promoters can be incorporated into all 
other major groups of vectors: adenoviruses, herpes simplex virus vectors, DNA transfection 
vectors, etc. It will be apparent to persons of ordinary skill in the art that similar 
combinatorial libraries can also be used to screen for other characteristics than transcription 
activity in a particular cell. For example, combinatorial libraries of complementarity 

25 determining regions (CDRs) of antibodies or T cell receptors can be so screened using 
antibody screening methods, such as the phage display scre^iing method (Pharmacia, 
Milwaukee, WI). Thus, the methods of this invention, particularly the combinatorial 
simplicity of this invention is a significant improvement over many in vivo recombination 
methods including those of (Stemmer, US Patent 5,605,793; 1997) that have described for the 

30 production of CDR combinatorial libraries. 
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Example 5 
Gene Assembly Line 

S From the above examples of 3 and 6 fragment gene self-assemblies, it is 

evident that assembly of genes by means of gene amplification, the use of offset restriction 
enzymes and incorporating unique, non-palindromic ends is a highly efficient process 
compared to conventional cloning methods. However, in addition to the considerations 
already discussed, it will be apparent to a person of ordinary skill in the art that the various 

1 0 procedures, protocols, methods and material of the instant invention become more difBcult to 
use as the number of fragments increases. For example, if the efficiency of combining each 
fragment in an assemblage is 99%, then the overall efficiency of combining ten fragments 
will be 90%, the efficiency of combining 100 fragments will be 37%, etc. Therefore, a small 
drop in efficiency of any step or fragment, or a large increase in the complexity of the project, 

IS will be sufGcient to reduce the overall efficiency. Fastidious procedures permit one to 
achieve success with more complex projects. 

Foremost in its potential for inducing failure is human error in primer design 
where large numbers of fragments are used. Fortunately, the instant invention is suited to 
automation of most of the steps. TMs allows human input to be focused on design, analysis, 

20 and quality control. For the purposes of generating large vectors or chromosomes, it is 
desirable to provide an automated environment One method to achieve this goal is a gene 
assembly line. 

In a gene assembly line, multiple tasks are controlled by a machine or 
machines working together to increase speed and efficiency and to reduce human error. For 

25 example, computer aided design (CAD) and computer aided manufacturing (CAM) are 

incorporated and combined with the methods of this invention. The computers accept inputs 
in the form of template and primer sequences, together with preferences of regions to be 
copied and joined. The preferences include at least the sequences of the primer regions and 
information about the known restriction sites and maps of the sequences to be assembled, but 

30 ideally include the entire sequence. The preferences also include the number of sequences to 
be joined, the desired Tm for the primers, the list of potential restriction enzymes capable of 
ofEset digestion that are potential candidates for use in the assembly process, the desired end 
structures for each firagment terminus, a tag sequence (if any), whether circular or linear ends 
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are desired, and additional design considerations. The computer algorithm then searches the 
sequences to determine the candidate enzymes and specific primers that match the criteria of 
the input Candidates for selection of unique non-palindromic overlaps are selected. The 
computer then posts selections or preferences for the type and order of end structures, the 

5 primer binding sites, their potential for primer-cUmer and intra-molecular interaction artifacts, 
and the potential conflicts with repeat sequences within the templates that could lead to 
incorrect polymerization. Based upon the selections made by the operator, the computer then 
determine the T,n for each primer, and makes adjustments (with suitable inputs from the 
investigator) to achieve a suitable T„ for the appropriate DNA synthesis or gene amplification 

10 reaction. Ideally, the primers should have similar T^^s so that all amplification reactions can 
be performed at once with one set of amplification instructions. In reality, it may be difiicult 
to do this with complex projects. The output of this portion of the program, which can be in a 
generic format, such as a Microsoft Excel spreadsheet is then downloaded to a computerized 
oligonucleotide synthesizer, such as the Applied Biosystems 3928 nucleic acid synthesizer. 

IS One advantage of using a computerized synthesizer is its robotic capability to de-protect and 
purify the oligonucleotides automatically. In addition this synthesizer can accept 
computerized input. 

The quantity of individual oligos recovered is then determined 
spectrophotometrically. It is desirable to purify the oligonucleotides by high performance 

20 liquid chromatography or by polyacrylamide gel. In a preferred embodiment, the 

oligonucleotides and templates are then assembled robotically using an automated nucleic 
acid handling system such as the Qiagen BioRobot 9600. The BioRobot is capable of 
accepting input fix>m a coniputer and can combine the gene amplification reactions based 
upon the assignments of templates, primer and reagents provided in the input. The assembled 

25 reactions are then amplified for example by PCR. In a preferred embodiment, the PGR heat 
block is incorporated into the robotic workspace and genes are assembled robotically but with 
minimal human intervention to change buffers, rearrange the platform, change programs, and 
the like. The resulting amplified products are also purified by the BioRobot or a similar 
robotic device. In a preferred embodiment, the robotic device uses Qiaquick cleanup 

30 procedures, or a similar method and then assembles restriction endonuclease reactions to 
digest the purified gene amplification products. The gene amplification products are loaded 
onto a gel and electrophoresed. Human intervention may be necessary to analyze the 
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products and excise the correct fragments from the gel. At this point, the results are assessed 
and missing or incorrect sized fragments are resynthesized. The robotic device is preferably 
used to purify the gel fragments using Quiagen or similar cleanup procedures. After 
spectrophotometric quantitation of the purified fragments, the robotic device is preferably 
5 used to assemble the ligation. Ideally the fragments are combined in an equimolar ratio of 
1:1. However it is not necessary to use equimolar ratios in order to achieve gene self- 
assembly. For automated gene assembly, it may be desirable not to use equimolar ratios of 
input fragments, particularly if it simplified the task of quantitation. After ligation, the 
assemblies can be purified and ethanol precipitated or they can be added to the appropriate 

1 0 host cells. Automation aids in maintaining the sterility of the reaction. 

Several additional considerations can assist in the construction of long genes 
using gene assembly. First the number of fragments and the length of constructs are limiting 
factors. In addition to maintaining high standards of purify of both the oligonucleotide 
primers and gene amplification products, it is important to keep the error rate low during 

15 copying. Thus, one can optimally start with 100 ng of template use only five rounds of gene 
amplification and finish with nearly 2 micrograms of product. This is more desirable for 
reducing errors than using a large number of amplification steps. It is also desirable to use a 
special copying enzyme such as Pju DNA polymerase that has a low intrinsic error rate. 
Further it is desirable to use in vivo selection (in eukaryotic cells or tissues) rather than £. coli 

20 cloning to reduce the incorporation of errors into the vectors. For example, a viral vector 
such as an adenoviral vector or the retro-vectors of the preceding examples are auto-selecting. 
A single correctiy-assembled adenovirus vector molecule, for example, leads to a lytic 
infection (the viral products of which are cloned by limiting dilution on the appropriate 
eukaryotic cells), even though it may be combined in a ligation mix with a large excess of 

25 incorrectly assembled molecules that are non-functional. Thus, it is not necessaty to have a 
high efficiency, although high efficiency has been demonstrated in this system, in order to 
achieve success in making, for example gene therapy vectors. 

For long fragments (3-30 kb), it is desirable to use enzymes and procedures 
that are designed or facilitate replication of long fragments, one such example is the 

30 eLONGase system (Life Technologies). This system can copy up to 30 kb on a fragment 

with proofreading. Considerations for long PGR are reviewed in Beck, 1998. (The Scientist 6 
Janary,-1998,pp. 16-18). 
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Internal restriction sites are a potential problem^ particularly with large 
constructs and can be overcome in a number of ways. Use of alternate enzymes, methylation 
of internal restrictions sites (such as by using methylated DNA precursors during synthesis to 
leave the sites in primers unaffected, incorporation of the internal sites into the construct (if 
S they are non-palindromic), or mutagenesis of internal sites, are exemplary ways to deal with 
some of these issues. 

For very large constructs, it is desirable to use enzymes such as Sapl 
(recognizing 7 nucleotides and leaving a 3 bp overhang). This enzyme digests every 16,384 
bp on average. There are 64 nucleotide triplet combinations, meaning that up to 32 fragments 

1 0 can be ligated in a circle using Sapl. Fokl and Hga 1 are other examples of class IIS 

enzymes that are useful for making large constructs. Hgal has S bp overhangs, permitting 
more than 500 Hgal fragments to be ligated. Fokl includes a Kozak ATG start codon. In a 
preferred embodiment, a Fokl site is inserted at the PuXXATG start site of a cDNA encoding 
region. The cDNA is inserted in frame, providing a site for inserting and switching coding 

1 S sequences within a vector. 

It will be readily understood by those skilled in the art that the foregoing 
description has been for purposes of illustration only and that a variety of embodiments can 
be envisioned without departing from the scope of the invention. Therefore, it is intended 
20 that the invention not be limited except by the claims. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: NATURE TECHNOLOGY CORPORATION, ET AL. 
(ii) TITLE OF INVENTION: SELF-ASSEMBLING GENES, VECTORS AND USES THEREOF 
(iii) NUMBER OF SEQUENCES: 68 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: MUETING, RAASCH & GEBHARDT, P. A. 

(B) STREET: 119 NORTH FOURTH STREET, SUITE 203 

(C) CITY: MINNEAPOLIS 

(D) STATE: MINNESOTA 

(E) COUNTRY: USA 

(F) ZIP: 55401 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: Not Assigned 

(B) FILING DATE: 28-FEB-1998 

(C) CLASSIFICATION: 

(vii) PRIORITY APPLICATION DATA: 

(A) APPLICATION NUMBER: 60/070,910 

(B) FILING DATE: 28-FEB-1997 

(C) CLASSIFICATION: 



(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: MCCORMACK, MYRA M. 

(B) REGISTRATION NUMBER: 36,602 

(C) REFERENCE/DOCKET NUMBER: 228,00010201 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 612-305-1225 

(B) TELEFAX: 612-305-1228 



(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6225 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 
TGAAGAATAA AAAATTACTG GCCTCTTGTG AGAACATGAA CTTTCACCTC GGAGCCCACC 60 



CCCTCCCATC TGGAAAACTC CAGTTATAAC TGGAGTTTTT CCTTTAAAAG CTTGTGAAAA 120 
ATTTGAGTCG TCGTCGAGAC TCCTCTACCC TGTGCAAAGG TGTATGAGTT TCGACCCCAG 180 
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AGCTCTGTGT GCTTTCTGTT GCTGCTTTAT 
TCATGTCGCT GCTTTATTAA ATCTTACCTT 

5 

TTGGGTACGC GGCTGTCCCG GGACTTGAGT 
TTCATTTGGT ACATGGGCCG GGAATTC6AG 
10 TCGAAAATCT TTCATTTGGT GCATTGGCCG 
ACCCACTTAG AGGTAAGATT CTTTGTTCTG 
GTGTTCTGTT TCTAAGTCTG GTGCGATCGC 

15 

CGCGCTCCGA GAGGGAGTGC GGGGTGGATA 
GTTCGCCCTG GGAGACGTCC CAGGAGGAAC 
20 CTTTGAAGGC CAAGAGACCA TTTGGGGTTG 
CCCAGTTGCG AGATCGTGGG TTCGAGTCCC 
CGAGTCCCAC CTCGCGTCTG GTCACGGGAT 

25 

TGCGAGATCG TGGGTTCGAG TCCCACCTCG 
CCACCTCGTG CAGAGGGTCT CAATTGGCCG 
30 TTCTCTTTTT GTCTTAGTCT CGTGTCCGCT 
GGACAATCTG TGTCCACTCC CCTTTCTCTG 
TGTTTACGTT TGTTTTTGTG AGTCGTCTAT 

35 

GGTTTACGGT TTCTGTGTGT GTCTTGTGTG 
TGACGACTGT TTTTAAGTTA TGCCTTCTAA 
40 TGCTGACCAC TTCCTTTCAG ATCAACAGCT 
GCAGTCGACG GTACCGCGGC CGCTAACTAA 
TCAATTCCGC CCCCCCCCTA ACGTTACTGG 

45 

GTTTGTCTAT ATGTTATTTT CCACCATATT 
ACCTGGCCCT GTCTTCTTGA CGAGCATTCC 
50 GCAAGGTCTG TTGAATGTCG TGAAGGAAGC 
AACGTCTGTA GCGACCCTTT GCAGGCAGCG 
CGGCCAAAAG CCACGTGTAT AAGATACACC 

55 

TGTGAGTTGG ATAGTTGTGG AAAGAGTCAA 

GCT6AAGGAT GCCCAGAAGG TACCCCATTG 

60 ATGCTTTACA TGTGTTTAGT CGAGGTTAAA 

CGT6GTTTTC CTTTGAAAAA CACGATACGG 

GAAGAACTTT TCACAGGAGT TGTCCCAATT 
65 CACAAATTTT CTGTCAGTGG AGAGGGTGAA 
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TTCGACCCCA 


GAGCTCTGGT 


CTGTGTGCTT 


240 


CTACATTTTA 


TGTATGGTCT 


CAGTGTCTTC 


300 


GTCTGAGTGA 


GGGTCTTCCC 


TCGAGGGTCT 


360 


AATCTTTCAT 


TTGGTGCATT 


GGCCGGGAAT 


420 


GGAAACAGCG 


CGACCACCCA 


GAGGTCCTAG 


480 


TTTTGGTCTG 


ATGTCTGTGT 


TCTGATGTCT 


540 


AGTTTCAGTT 


TTGCGGACGC 


TCAGTGAGAC 


600 


AGGATAGACG 


TGTCCAGGTG 


TCCACCGTCC 


660 


AGGGGAGGAT 


CAGGGACGCC 


TGGTGGACCC 


720 


CGAGATCGTG 


GGTTCGAGTC 


CCACCTCGTG 


780 


ACCTCGTGTT 


TTGTTGCGAG 


ATCGTGGGTT 


840 


CGTGGGTTCG 


AGTCCCACCT 


CGTGTTTTGT 


900 


CGTCTGGTCA 


CGGGATCGTG 


GGTTCGAGTC 


960 


GCCTTAGAGA 


GGCCATCTGA 


TTCTTCTGGT 


1020 


CTTGTTGTGA 


CTACTGTTTT 


TCTAAAAATG 


1080 


ACTCTGGTTC 


TGTCGCTTGG 


TAATTTTGTT 


1140 


GTTGTCTGTT 


ACTATCTTGT 


TTTTGTTTGT 


1200 


TCTCTTTGTG 


TTCAGACTTG 


GACTGATGAC 


1260 


AATAAGCCTA 


AAAATCCTGT 


CAGATCCCTA 


1320 


GCCCTTACTC 


GAGCTCAAGC 


TTCGAATTCT 


1380 


TAGCCCATTC 


TCCAAGGTAC 


GTAGCGGGGA 


1440 


CCGAAGCCGC 


TTGGAATAA6 


GCCGGTGTGC 


1500 


GCCGTCTTTT 


GGCAATGTGA 


GGGCCCGGAA 


1560 


TAGGGGTCTT 


TCCCCTCTCG 


CCAAAGGAAT 


1620 


AGTTCCTCTG 


GAAGCTTCTT 


GAAGACAAAC 


1680 


GAACCCCCCA 


CCTGGCGACA 


GGTGCCTCTG 


1740 


TGCAAAGGCG 


GCACAACCCC 


AGTGCCACGT 


1800 


ATGGCTCTCC 


TCAAGCGTAT 


TCAACAAGGG 


1860 


TATGGGATCT 


GATCTGGGGC 


CTCGGTGCAC 


1920 


AAAACGTCTA 


GGCCCCCCGA ACCACGG6GA 


1980 


GATCCACCGG 


TCGCCACCAT 


GGGTAAAGGA 


2040 


CTTGTTGAAT 
GGTGATGCAA 


TAGATGGTGA TGTTAATGGG 
CATACGGAAA ACTTACCCTT 


2100 
2160 
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AAATTTATTT 


GCACTACTGG 


AAAACTACCT 


GTTCCATGGC 


CAACACTTGT 


CACTACTTTC 


2220 


ACTTATGGTG 


TTCAATGCTT 


TTCAAGATAC 


CCAGATCATA 


TGAAACGGCA 


TGACTTTTTC 


2280 


AAGAGTGCCA 


TGCCCGAAGG 


TTATGTACAG 


GAAAGAACTA 


TATTTTTCAA 


AGATGACGGG 


2340 


AACTACAAGA 


CACGTGCTGA 


AGTCAAGTTT 


GAAGGTGATA 


CCCTTGTTAA 


TAGAATCGAG 


2400 


TTAAAAGGTA 


TTGATTTTAA 


AGAAGATGGA 


AACATTCTTG 


GACACAAATT 


GGAATACAAC 


2460 


TATAACTCAC 


ACAATGTATA 


CATCATGGCA 


GACAAACAAA 


AGAATGGAAC 


CAAAGTTAAC 


2520 


TTCAAAATTA 


GACACAACAT 


TGAAGATGGA 


AGCGTTCAAC 


TAGCAGACCA 


TTATCAACAA 


2580 


AATACTCCAA 


TTGGCGATGG 


CCCTGTCCTT 


TTACCAGACA 


ACCATTACCT 


GTCCACACAA 


2640 


TCTGCCCTTT 


CG7UUVGATCC 


CAACGAAAAG 


AGAGACCACA 


TGGTCCTTCT 


TGAGTTTGTA 


2700 


ACAGCTGCTG 


GGATTACACA 


TGGCATGGAT 


GAACTATACA 


AGTCCGGATC 


TAGATAACTG 


2760 


TATCGATGGA 


TCCGAAGGCG 


GGGACAGCAG 


TGCAGTGGTG 


GACAGAAAGC 


AAGTGATCTA 


2820 


GGCCAGCAGC 


CTCCCTAAAG 


GGACTTCAGC 


CCACAAAGCC 


AAACTTGTGG 


CTTTAATACA 


2880 


AGCTCTGTAA 


ATGGTAAAAA 


AAAAAAAGTC 


TACACGGACA 


GCAGGTATGC 


TCTTGCCACT 


2940 


GTACAGAGCA 


ATATACAGAC 


AAAGAGAACT 


GTTGACATCT 


GCAGAGAAAG 


ACCTAAGATG 


3000 


CTGTGGCTAA 


AAGAAATCAG 


ATGGCAAATC 


TAACCGCCCA 


GGCATCCTAA 


AGAGCAATGA 


3060 


TCCTGACAGT 


CTGAAGACTA 


TCAAGTTATA 


6ACAAATTAA 


GACTGGTAAA 


AAAAACCCTG 


3120 


TATAAAATAG 


TAAAAACTGA 


AAAAAGAAAA 


CTAGTCCTCT 


CATGAGAAGA 


CAGACCTGAC 


3180 


ATCTACTGAA 


AAATAGACTT 


TACTGGAAAA 


AATATGTGTA 


TGAATACCTT 


CTAGTTTTTG 


3240 


TGAACGTTCT 


CAAGATGGAT 


AAAAGCTTTT 


CCTTGTAAAA 


CGAGACTGAT 


CAGATAGTCA 


3300 


TCAAGAAGAT 


TGTTAAAGAA 


AATTTTCCAA 


GGTTCGGAGT 


GCCAAAAGCA 

mm mm mm m ^m mt 


ATAGTGTCAG 


3360 


ATAATGGTCC 


TGCCTTTGTT 


GCCCAGGTAA 


GTCAGGGTGT 


GGCCAAGTAT 


TTAGAGGTCA 


3420 


AATGAAAATT 


CCATTGTGTG 


TACAGACCTC 


AGAGCTCAGG 


AAAGATAAAA 

* mm mm mi^mm m 4 m mm mm mm w 


AAGAATAAAT 

m mm m^^m mm a w • mm • • 


3480 


AAAACTCTAA 


ACAGACCTTG 


ACAAAATTAA 


TCCTAGAGAC 


TGGCACAGAC 


TTACTTGGTA 

A * m^0 mm A v v ^ 


3540 


CTCCTTCCCC 


TTGCCCTATT 


TAGAACTGAG 


AATACTCCCT 


CTTGATTCGG 


TTTTACTCTT 


3600 


TTTAAGATCC 


TTTATGGGGC 


TCCTATGCCA 


TCACTGTCTT 


AAATGATGTG 


TTTAAACCTA 


3660 


TGTTGTTATA 


ATAATGATCT 


ATATGTTAAG 


TTAAAAGGCT 


TGCAGGTGGT 


GCAGAAAGAA 

^mm m^^m mm mm m^^m mm m 


3720 

mm 


GTCTGGTCAC 


AACTGGCTAC 


AGTGAACAAG 


CTGGGTACCC 


CAAGGACATC 


TTACCAGTTC 


3780 


CAGCCAGAGA 


TCTGATCTAC 


GATCCCCGGG 


TCGACCC6GG 


TCGACCCTGT 


GGAATGTGTG 


3840 


TCAGTTAGGG 


TGTGGAAAGT 


CCCCAGGCTC 


CCCAGCAGGC 


AGAAGTATGC 


AAAGCATGCA 


3900 


TCTCAATTAG 


TCAGCAACCA 


GGTGTGGAAA 


GTCCCCAGGC 


TCCCCAGCAG 


GCAGAAGTAT 


3960 


GCAAAGCATG 


CATCTCAATT 


AGTCAGCAAC 


CATAGTCCCG 


CCCCTAACTC 


CGCCCATCCC 


4020 


GCCCCTAACT 
TTATGCAGAG 


CCGCCCAGTT 
GCCGA6GCCG 


CCGCCCATTC 
CCTCGGCCTC 


TCCGCCCCAT 
TGAGCTATTC 


GGCTGACTAA 
CAGAAGTAGT 


TTTTTTTTAT 
GAGGAGGCTT 


4080 
4140 


TTTTGGAGGC 


CTAGGCTTTT 


GCAAAAAGCT 


TCACGCTGCC 


GCAAGCACTC 


AGGGCGCAAG 


4200 
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GGCTGCTAAA GGAAGCGGAA CACGTAGAAA 
ATGAATGTCA GCTACTGGGC TATCTG6ACA 

5 

GTAGCTTGCA GTGGGCTTAC ATGGCGATAG 
GAACCGGAAT TGCCAGCTGG GGCGCCCTCT 
10 TGGATGGCTT TCTTGCCGCC AAG6ATCTGA 
ACAGGATGAG GATCGTTTCG CATGATTGAA 
GCTTGGGTGG AGAGGCTATT CGGCTATGAC 

15 

GCCGCCGTGT TCCGGCTGTC AGCGCAGGGG 
TCCGGTGCCC TGAATGAACT GCAGGACGAG 
20 GGCGTTCCTT GCGCAGCTGT GCTCGACGTT 
TTGGGCGAAG TGCCGGGGCA GGATCTCCTG 
TCCATCATGG CTGATGCAAT GCGGCGGCTG 

25 

GACCACCAAG CGAAACATCG CATCGAGCGA 
GATCAGGATG ATCTGGACGA AGAGCATCAG 
30 CTCAA66CGC GCATGCCCGA CGGCGA66AT 
CCGAATATCA TGGTGGAAAA TGGCCGCTTT 
GTGGCGGACC GCTATCAGGA CATAGCGTTG 

35 

GGCGAATGGG CTGACCGCTT CCTCGTGCTT 
ATCGCCTTCT ATCGCCTTCT TGACGAGTTC 
40 CCGACCAAGC GACGCCCAAC CTGCCATCAC 
AAAGGTTGGG CTTCGGAATC GTTTTCCGGG 
AAAAAACCAC CGCTACCAGC GGTG6TTTGT 

45 

CCGAAGGTAA CTG6CTTCAG CAGAGCGCAG 
TAGTTAGGCC ACCACTTCAA GAACTCTGTA 
50 CTGTTACCAG TGGCTGCTGC CAGTGGCGAT 
CGATAGTTAC CGGATAAGGC GCAGCGGTCG 
AGCTTGGAGC GAACGACCTA CACCGAACTG 

55 

GCCACGCTTC CCGAAGGGAG AAAGGCGGAC 

GGAGAGCGCA CGAGG6AGCT TCCAGGGGGA 

60 TTTCGCCACC TCTGACTTGA GCGTCGATTT 

TGGAAAAACG CCAGCAACGC CGAGATGCGC 
CCCTCAAGCC TCACTAAAAG GGTCCCTGCC 

65 TTTTTGTTCC CATGTTAAAG ATAGAGTAAA 



GCCAGTCCGC 


AGAAACGGTG 


CTGACCCCGG 


4260 


A6GGAAAACG 


CAAGCGCAAA 


GAGAAAGCAG 


4320 


CTAGACTGGG 


CGGTTTTATG 


GACAGCAAGC 


4380 


GGTAAGGTTG 


GGAAGCCCTG 


CAAAGTAAAC 


4440 


TGGCGCAGGG 


GATCAAGATC 


TGATCAAGAG 


4500 


CAAGATGGAT 


TGCACGCAGG 


TTCTCCGGCC 


4560 


TGGGCACAAC 


AGACAATCGG 


CTGCTCTGAT 


4620 


CGCCCGGTTC 


TTTTTGTCAA 


GACCGACCTG 


4680 


GCAGCGCGGC 


TATCGTGGCT 


6GCCACGACG 


4740 


GTCACTGAAG 


CGGGAAGGGA 


CTGGCTGCTA 


4800 


TCATCTCACC 


TTGCTCCTGC 


CGAGAAAGTA 


4860 


CATACGCTTG 


ATCCGGCTAC 


CTGCCCATTC 


4920 


GCACGTACTC 


GGATGGAAGC 


CGGTCTTGTC 


4980 


GGGCTCGCGC 


CAGCCGAACT 


GTTCGCCAGG 


5040 


CTCGTCGTGA 


CCCATGGCGA 


TGCCTGCTTG 


5100 


TCTGGATTCA 


TCGACTGTGG 


CCGGCTGGGT 


5160 


GCTACCCGTG 


ATATTGCTGA AGAGCTTGGC 


5220 


TACGGTATCG 


CCGCTCCCGA 


TTCGCAGCGC 


5280 


TTCTGAGCGG 


GACTCTGGGG 


TTCGTWITGA 


5340 


GAGATTTCGA 


TTCCACCGCC 


GCCTTCTATG 


5400 


ACGGAATTCG 


TAATCTGCTG 


CTTGCAAACA 


5460 


TTGCCGGATC 


AAGAGCTACC 


AACTCTTTTT 


5520 


ATACCAAATA 


CTGTCCTTCT 


AGTGTAGCCG 


5580 


GCACCGCCTA 


CATACCTCGC 


TCTGCTAATC 


5640 


AAGTCGTGTC 


TTACCGGGTT 


GGACTCAAGA 


5700 




GGGGTTCGTG 


CACACAGCCC 


57 60 


AGATACCTAC 


AGCGTGAGCA 


TTGAGAAAGC 


5820 


AGGTATCCGG 


TAAGCGGCAG 


GGTCGGAACA 


5880 


AACGCCTGGT 


ATCTTTATAG 


TCCTGTCGGG 


5940 


TTGTGATGCT 


CGTCA6GGGG 


GCGGAGCCTA 


6000 


C6CCTCGAGT 
TAGTTCTGTT 


ACACCTGCGT 
TACTAATCTG 


CATGCTGAGA 
CCTTATTCTG 


6060 
6120 


T6CAGTATTC 


TCCACATAGA 


GATATAGACT 


6180 
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TCTGTIAATTC TAAGATTAGA ATTATTTACA AGAAGAAGTG GGGAA 6225- 

(2) INFORMATION FOR SEQ ID N0:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 487 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

CCTCCCATCT AGAGGTTGTT CTCGGAACAC TCCTAAACTT TTCACCCCAA AACTCCTCAC 60 

CCTAAAGTTC GAAAAAACTG TTCCAAGAAC ATTTTTGAGA TAAAGGCCTC CTAGAACAAC 120 

CTCAAAATGA CATT6CCAAA TGATAAGACA TGACTCCTTA GTTACGTAGG TTCCTTGATA 180 

GGACATGACT CCTTAGTTAC GTAGGTTCCT TGATAGGACA TGACTCCTTA GTTACGTAGA 240 

TTCCTTTGGT AGAACTCCCT AGTGATGTAA ACTTGTACTT TCCCTGCCCA GTTCTCCCCC 300 

TTTGAGTTTT ACTATATAAG CCTGTAAAAA ATTTTTGCTG ACC6TCGAGA CTCCTCTACC 360 

CTGTGCTAAG GTGTATGAGT TTCGACCCCA GAGCTCTGTG TGCTTCCATG TTGCTGCTTT 420 

ATTTCGACCC CAGAGCTCTG GTCTGTGTGC TTTCATGTCG CTGCTTTATT AAATCTTGCC 480 

TTCTACA 487 
(2) INFORMATION FOR SEQ ID N0:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 366 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

CCTCCCATCT AGAAAACATT TTTGAGATAA AGGCTTCCTG GAACAACCTC AAAATGAACC 60 

AGGTACTCCT TAGTTACGTA GGTTCCTTGA TAGGACIATGA CTCCTTAGTT ACATAGATTC 120 

CTTTGGCAGA ACTCCCTAGT GATGTAAACT TGTACTTTCC CTGCCCAGTT CTCCCCCTTT 180 

GAGTTTTACT ATATAAGCCT GTGAA/iAATT TTGGCTGACC GTCGAGACTC CTCTACCCTG 240 

TGCTAAGGTG TATGAGTTTC GACCCCAGAG CTCTGTGTGC TTCCATGTTG CTGCTTTATT 300 

TCGACCCCAG AGCTCTGGTC TGTGTGCTTT CATGTTGCTG CCTTATTAAA TCTTGCCTTC 360 
TACATT 366 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 304 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

CCTCCCATCT AGAGATTGTT CCCAGAACAC TCCTGAACTC TTCACCCCAG AATGCATGCC 60 

TGAACTCCTC ACCCTAGAGT TCGAACCCTC CCAACTAAAG ACTGTTCCAA GAACATTTTT 120 

GAGATAAGGG CCTCCTGGAA CAACCTCAGA ATGAACCGGG TACATTGCCA AATAATAGGA 180 

CATGACCCCT TAGTTACGTA AAATCCCTTG GCAGAACCCC TTGTCCCTTG GCAGAACCCC 240 

TTAGTTAT6T AAACTTGTAC TTTCCCTACC CCGCTCTCCC CCCTTGAGTT TTTCCTATAT 300 

AAGC 304 
(2) INFORMATION FOR SEQ ID N0:5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 304 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

CCTCCCATCT AGAGAGT6TT CCCAGAACAC TCCTGAACTC TTCACCCCAG AATGCATTCC 60 

TGAACTCCTC ACCCTAGAGT TCGAACCCTC CCAACTAAAG ACTGTTCCAA GAACATTTTT 120 

GAGATAAGGG CCTCCTGGAA CAACCTCAGA ATGAACCGGA TACATTGCCA AATAATAGGA 180 

CATGACCCCT TAGTTACGTA GAATCCCTTG GCAGAACCCC TTGTCCCTTG GCAGAACCCC 240 

TTAGTTATGT AAACTTGTAC TTTCCCTACC CCGCTCTCCC CCCTTGAGTT TTTCCTATAT 300 

AAGC 304 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 304 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CCTCCCATCT AGAGAGTGTT CCCAGAACAC TCCTGAACTC TTCACCCCAG AATGCATTCC 60 
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TGAACTCCTC ATCCTAGAGT TCGAACCCTC CCAACTAAAG ACTGTTCCAA GAACATTTTT 120 

GAGATAAGGG CCTCCTGGAA CAACCTCAGA ATGAACCTGG TACATTGCCA AATAATAGGA 180 

CATGACCCTT TAGTTACGTA GAATCCCTTG GCAGAACCCC TTGTCCCTTG GCAGAACCCC 240 

TTAGTTATGC AAACTTGTAC TTTCTCTGCC CCGCTCTCCC CCCTTGAGTT TTTCCTATAT 300 

10 AAGC 304 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 304 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 

20 (ii) MOLECULE TYPE: DNA (genomic) 



25 <xi) SEQUENCE DESCRIPTION: SEQ ID N0:7: 

CCTCCCATCT AGAGAGTGTT CCCAGAACAC TCCTGAACTC TTCACCTCAA AATGCATTCC 60 
TGAACTCCTC ACCCTAGAGT TCGAACCCTC CCAACTAAAG ACTGTTCCAA GAACATTTTT 120 

30 

GAGATAAGGG CCTCCTGGAA CAACCTCAGA ATGAACCAGG TACATTGCCA AATAATAGGA 180 
CATGACCCTT TAGTTACGTA GAATCCCTTG GCAGAACCCC TTGTCCCTTG GCAGAACCCC 240 
35 TTAGTTATGC AAACTTGTAC TTTCTCTGCC CCGCTCTCCC CCCTTGAGTT TTTCCTATAT 300 
AAGC 304 
(2) INFORMATION FOR SEQ ID NO: 8: 

40 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 305 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
45 (D) TOPOLOGY: linear 



50 



55 



65 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

CCTCCCATCT AGAGATTGTT CCCAGAACAC TCCTGAACTC TTCACCCCAG AATGCATTCC 60 

TGAACTCCTC ACCCTAGAGT TCGAACCCTC CCAACTAAAG ACTGTTCCAA GAACATTTTT 120 

GAGATAAGGG CCTCCTGGAA CAACCTCAGA ATGAACCG6A TACATTGCCA AATAATAGGA 180 

60 CATGACCCCT TAGTTACGTA GAATTCCCTT GGCAGAACCC CTTGTCCCTT GGCAGAACCC 240 

CTTAGTTATG CAAACTTGTA CTTTCCCTGC CCCGCTCTCC CCCCTTGAGG TTTTCCTATA 300 

TAAGC 305 
(2) INFORMATION FOR SEQ ID NO: 9: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 305 base pairs 

(B) TYPE: nucleic acid 

(C) STRJ^DEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

CCTCCCATCT AGAGAGTGTT CCCAGAACAC TCCTGAACTC TTCACCCCAG AATGCATTCC 60 

TGAACCCCTC ACCCTAGAGT TCGAACCCTC CCAACTAAAG ACTGTTCCAA GAACATTTTT 120 

GAGATAAGGG CCTCCTGGAA CAACCTCAGA ATGAACCAGG TACATTGCCA AATAATAGGA 180 

CATGACCCCT TAGTTACGTA GAATTCCCTT GGCAGAACCC CTTGTCCCTT GGCAGAACCC 240 

CTTAGTTATG CGAACTTGTA CTTTCCCTGC CCCGCTCTCC CCCCTTGAGT TTTTCCTATA 300 * 

TTVAGC 305 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 306 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

CCCTCCCATC TAGAGAGTGT TCCCA6AACA CTCCTGAACT CTTCATCCCA Gi\ATGCATTC 60 

CTGAACTCCT CACCCTATAG TTCGAACCCT CCCAACTAAA GACTGTTCCA AGAACATTTT 120 

TGAGATAAGG GCCTCCTGGA ACAACCTCAG AATGAACCGG GTACATTGCC AAATAATAGG 180 

ACATGACCCC TTAGTTACGT AGAATTCCCT TGGCAGAACC CCTTGTCGCT TGGCAGAACC 240 

CCTTAGTTAT GTAAACTTGT ACTTTCCCTG CCCCGCTCTC CCCCCTTGAG TTTTTACTAT 300 

ATAAGC 306 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 305 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



CCTCCCATCT AGAGAGTGTT CCCAAAACAC TCCTGAACTC TTCACCCCAG AATGCATTCC 



60 



TGAACTCCTC ACCCTAAAGT TCAAACCCTC CCAACTAAAG ACTGTTCCAA GAACATTTTT 



120 



GAGATAAGGG CCTCCTGGAA CAACCTCAGA ATGAACCGGG TACATTGCCA AATAATAGGA 



180 



CATGACCCCT TAGTTACACA GAATTCCCTT GGCAAAACCC CTTGTCCCTT GGCAGAACCC 



240 



CTTAGTTATG CAAACTTGTA CTTTCCCTGC CCAGCTCTCC CCCCTTGAGT TTTTCCTATA 



300 



TAAGC 



305 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 304 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:12: 

CCTCCCATCT AGAGAGTGTT CCCAGAACAC TCCTGAACTC TTCACCCCAG AATGCATTCC 60 

TGAACTCCTC ACCCTAGAGT TTGAACCCTC CCAACTAAAG ACTGTTCCAA GAACT^TCTTT 120 

GAGATAAGGG CCTCCTGGAA CAACCTCAGA ATGAACCGGG TACATTGCCA AATAATAGGA 180 

CIATGACCCCT TAGTTACGTA GAATTCCCTT GGCAGAACCC CTTGTCGCTT GGCAGAACCC 240 

CTTAGTTATG CAAACTTGTA CTTTCCCTGC CCCGCTCTCC CCCTTGAGTT TTTCCTATAT 300 

AAGC 304 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 303 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

CCTCCCATCT AGAGAGTGTT CCCAGAACAC TCCTAAACTC TTCACCCCAG AATGCATTCC 60 
TGAACTCCTC ACCCTAGAGT TCGAACCCTT CCAACTAAAG ACTGTTCCAA GAACATTTTT 120 

GAGATAAGGG CCTCCTGGAA CAACCTCAAA ATGAACCGGG TACATTGCCA AATGATAGGA 180 

(yVTGACCCCT TAGTTACGTA GATTCCCTTG GCAGAACCCC TTGTCCCTTG GCAGAACCCC 240 

CTAGTGATGT AAACTTGTAC TTTCCCTGCC CAGCTCTCCC CCCTTGAGTT TTCCTATATA 300 

AGC 303 
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(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 8657 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(0) TOPOLOGY: linear 

10 <ii) MOLECULE TYPE: DMA (genomic) 



15 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



20 



30 



35 



40 



45 



50 



55 



60 



65 



T6AAGAATAA AAAATTACT6 


GCCTCTTGTG 


AGAACATGAA 


CTTTCACCTC 


GGAGCCCACC 


60 


CCCTCCCATC 


TGGAAAACAT 


ACTTGAGAAA AACATTTTCT 


GGAACAACCA 


CAGAATGTTT 


120 


CAACAGGCCA 


GATGTATTGC 


CAAACACAGG 


ATATGACTCT 


TTGGTTGAGT 


AAATTTGTGG 


180 


TTGTTAAACT 


TCCCCTATTC 


CCTCCCCATT 


CCCCCTCCCA 


GTTTGTGGTT 


TTTTCCTTTA 


240 


AAAGCTTGTG 


AAAAATTTGA 


GTCGTCGTCG 


AGACTCCTCT 


ACCCTGTGCA 


AAGGTGTATG 


300 


AGTTTCGACC 


CCAGAGCTCT 


GTGTGCTTTC 


TGTTGCTGCT 


TTATTTCGAC 


CCCAGAGCTC 


360 


TGGTCTGTGT 


GCTTTCATGT 


CGCTGCTTTA 


TTAAATCTTA 


CCTTCTACAT 


TTTATGTATG 


420 


GTCTCAGTGT 


CTTCTTGGGT 


ACGCGGCTGT 


CCCGGGACTT 


GAGTGTCTGA 


GTGAGGGTCT 


480 


TCCCTCGAGG 


GTCTTTCATT 


TGGTACATGG 


GCCGGGAATT 


CGAGAATCTT 


TCATTTGGTG 


540 


CATTGGCCGG 


GAATTCGAAA ATCTTTCATT 


TGGTGCATTG 


GCCGGGAAAC 


AGCGCGACCA 


600 


CCCAGAGGTC 


CTAGACCCAC 


TTAGAGGTAA 


GATTCTTTGT 


TCTGTTTTGG 


TCTGATGTCT 


660 


GTGTTCTGAT 


GTCTGTGTTC 


TGTTTCTAAG 


TCTGGTGCGA 


TCGCAGTTTC 


AGTTTTGCGG 


720 


ACGCTCAGTG 


AGACCGCGCT 


CCGAGAGGGA 


GTGCGGGGTG 


GATAAGGATA 


GACGTGTCCA 


780 


GGTGTCCACC 


GTCCGTTCGC 


CCTGGGAGAC 


GTCCCAGGAG 


GAACAGGGGA 


GGATCA6GGA 


840 


CGCCTGGTGG ACCCCTTTGA AGGCCAAGAG ACCATTTGGG GTTGCGAGAT 


CGTGGGTTCG 


900 


AGTCCCACCT 


CGTGCCCAGT 


TGCGAGATCG 


TGGGTTCGAG 


TCCCACCTCG 


TGTTTTGTTG 


960 


CGAGATCGTG 


GGTTCGAGTC 


CCACCTCGCG 


TCTGGTCACG 


GGATCGTGGG 


TTCGAGTCCC 


1020 


ACCTCGTGTT 


TTGTTGCGAG 


ATCGTGGGTT 


CGAGTCCCAC 


CTCGCGTCTG 


GTCACGGGAT 


1080 


CGTGGGTTCG 


AGTCCCACCT 


CGT6CAGAGG 


GTCTCAATTG 


GCCGGCCTTA 


GAGAGGCCAT 


1140 


CTGATTCTTC 


TGGTTTCTCT 


TTTTGTCTTA 


GTCTCGTGTC 


CGCTCTTGTT 


GTGACTACTG 


1200 


TTTTTCTAAA AATGGGACAA TCTGTGTCCA CTCCCCTTTC TCTGACTCTG 


GTTCTGTCGC 


1260 


TTGGTAATTT 


TGTTTGTTTA 


CGTTTGTTTT 


TGTGAGTCGT 


CTATGTTGTC 


TGTTACTATC 


1320 


TTGTTTTTGT 


TTGTGGTTTA CGGTTTCTGT 


GTGTGTCTTG 


TGTGTCTCTT 


TGTGTTCAGA 


1380 


CTTGGACTGA 


TGACTGACGA 


CTGTTTTTAA 


GTTATGCCTT 


CTAAAATAAG 


CCTAAAAATC 


1440 


CTGTCAGATC 


CCTATGCTGA 


CCACTTCCTT 


TCAGATCAAC 


AGCTGCCCTT 


ACGTATCGAT 


1500 
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GGATCCCTCG ACTAACTAAT AGCCCATTCT 
CCCCTAACGT TACTGGCCGA AGCCGCTTGG 
5 TATTTTCCAC CATATTGCCG TCTTTTGGCA 
TCTTGAC6AG CATTCCTAGG GGTCTTTCCC 
ATGTCGTGAA GGAAGCAGTT CCTCTGGAA6 

10 

CCCTTTGCAG GCAGCGGAAC CCCCCACCTG 
GTGTATAAGA TACACCTGCA AAGGCGGCAC 
15 TTGTGGAAAG AGTCAAATGG CTCTCCTCAA 
AGAAGGTACC CCATTGTATG GGATCTGATC 
TTTAGTCGAG GTTAAAAAAA CGTCTAGGCC 

20 

GAAAAACACG ATAATAATCA TGGGCGCGGA 
AAACCCTGGC GTTACCCAAC TTAATCGCCT 
25 TAATAGCGAA GAGGCCCGCA CCGATCGCCC 
ATGGCGCTTT GCCTGGTTTC CGGCACCAGA 
TCTTCCTGAG GCCGATACTG TCGTCGTCCC 

30 

GCCCATCTAC ACCAACGTAA CCTATCCCAT 
GAATCCGACG GGTTGTTACT CGCTCACATT 
35 CCAGACGCGA ATTATTTTTG ATGGCGTTAA 
CTGGGTCGGT TACGGCCAGG ACAGTCGTTT 
ACGCGCCGGA GAAAACCGCC TCGCGGTGAT 

40 

GGAAGATCAG GATATGTGGC GGATGAGCG6 
ACCGACTACA CAAATCAGCG ATTTCCATGT 
45 CGCTGTACTG GAGGCTGAAG TTCAGATGTG 
AGTTTCTTTA TGGCAGGGTG AAACGCAGGT 
AATTATCGAT GAGCGTGGTG GTTATGCCGA 

50 

CCCGAAACTG TGGAGCGCCG AAATCCCGAA 

CGCCGACGGC ACGCTGATTG AAGCAGAAGC 

55 TGAAAATGGT CTGCTGCTGC TGAACGGCAA 
CGAGCATCAT CCTCTGCATG GTCAGGTCAT 

GCTGATGAAG CAGAACAACT TTAACGCCGT 

60 GTGGTACACG CTGTGCGACC GCTACGGCCT 

CCACGGCATG 6T6CCAATGA ATCGTCTGAC 

CGAACGCGTA ACGCGAATGG TGCAGCGCGA 

65 

GCTGGGGAAT GAATCAGGCC ACGGCGCTAA 



CCAAGGTCGA 


GCGGGATCAA 






AATAAGGCCG 


GTGTGCGTTT 






ATGTGAGGGC 


CCGGAAACCT 


GGCCCTGTCT 


X Q O V 


CTCTCGCCAA AGGAATGCAA 


GGTCTGTTGA 


1740 


CTTCTTGAAG 


ACAAACAACG 


TCTGTAGCGA 


1800 


GCGACAGGTG 


CCTCTGCGGC 


CAAAAGCCAC 


I860 

Jt w w w 


AACCCCAGTG 


CCACGTTGTG 


AGTTGGATAG 


1920 


GCGTATTCAA 


CAAGGGGCTG 


AAGGATGCCC 


X V 


TGGGGCCTCG 


GTGCACATGC 


jL X X nSmrn X V3 X N9 


9040 


CCCCGAACCA 


CGGGGACGTG 


Ul X 1 XImtI^X X 1 




TCCCGTCGTT 


TTACAACGTC 




91 fin 


TGCAGCACAT 


CCCCCTTTCG 




999n 


TTCCCAACAG 


TTGCGCAGCC 


TGZXaTGGPGa 


99fln 


AGCGGTGCCG 


GAAAGCTGGC 






CTCAAACTGG 


CAGATGCACG 






TACGGTCAAT 


CCGCCGTTTG 


X X ^i^V^W^A^VauA 


94 fin 


TAATGTTGAT 


GAAAGCTGGC 






CTCGGCGTTT 


CATCTGTGGT 






GCCGTCTGAA 


TTTGACCTGA 


UV^OV^AX X X X X 


9fi40 


GGTGCTGCGT 


TGGAGTGACG 


GrAGTTATPT 

\3\^T%\3 X Xf^X\^ L 


2700 


CATTTTCCGT 


GACGTCTCGT 


TGCTGCATAA 


2760 


TGCCACTCGC 


TTTAATGATG 


ATTTPACPPR 
/\x X x^«nv3wwU 


2890 


CGGCGAGTTG 


CGTGACTACC 


TACGGGTAAC 


2880 


CGCCAGCGGC 


ACCGCGCCTT 


TCGGCGGTGA 


2940 


TCGCGTCACA 


CTACGTCTGA 


ACGTCGAAAA 


3000 


TCTCTATCGT 


GCGGTGGTTG 


AACTGCACAC 


3060 


CTGCGATGTC 


GGTTTCCGCG 


AGGTGCGGAT 


3120 


GCCGTTGCTG 
GGATGAGCAG 


ATTCGAGGCG 
ACGATGGTGC 


TTAACCGTCA 
AGGATATCCT 


3180 
3240 


GCGCTGTTCG 


CATTATCCGA 


ACCATCCGCT 


3300 


GTATGTGGTG 


GATGAAGCCA 


ATATT6AAAC 


3360 


CGATGATCCG 


CGCTGGCTAC 


CGGCGATGAG 


3420 


TCGTAATCAC 


CCGAGTGTGA 


TCATCTGGTC 


3480 


TCACGACGCG 


CTGTATCGCT 


GGATCAAATC 


3540 



AVO 98/38326 

TGTCGATCCT TCCCGCCCGG TGCAGTATGA 
TATTATTTGC CCGATGTACG CGCGCGTGGA 

5 

ATGGTCCATC A7VAAAATGGC TTTCGCTACC 
ATACGCCCAC GCGATGGGTA ACAGTCTTGG 
10 TCAGTATCCC CGTTTACAGG GCGGCTTCGT 
ATATGATGAA AACGGCAACC CGTGGTCGGC 
CGATCGCCAG TTCTGTATGA ACGGTCTGGT 

15 

GACGGAAGCA AAACACCAGC AGCAGTTTTT 
AGTGACCAGC GAATACCTGT TCCGTCATAG 
20 GCTGGATGGT AAGCCGCTGG CAAGCGGTGA 
ACAGTTGATT GAACTGCCTG AACTACCGCA 
AGTACGCGTA GTGCAACCGA ACGCGACCGC 

25 

GCAGCAGTGG CGTCTGGCGG AAAACCTCAG 
CCCGCATCTG ACCACCAGCG AAATGGATTT 
30 ATTTAACCGC CAGTCAGGCT TTCTTTCACA 
GACGCCGCTG CGCGATCAGT TCACCCGTGC 
AGCGACCCGC ATTGACCCTA ACGCCTGGGT 

35 

GGCCGAAGCA GCGTTGTTGC AGTGCACGGC 
GACCGCTCAC GCGTGGCAGC ATCAGGGGAA 
40 GATTGATGGT AGTGGTCAAA TGGCGATTAC 
GCATCCGGCG CGGATTGGCC TGAACTGCCA 
GCTCGGATTA 6GGCCGCAAG AAAACTATCC 

45 

CTGGGATCTG CCATTGTCAG ACATGTATAC 

GCGCTGCGGG ACGCGCGAAT TGAATTATGG 

50 C7UVCATCAGC CGCTACAGTC AACAGCAACT 

CGCGGAAGAA GGCACATGGC TGAATATCGA 

CTCCTGGAGC CCGTCAGTAT CGGCGG7ATT 
55 GTTG6TCTGG TGTCAAAAAT AATAATAACC 

AGTGCAGTGG TGGACAGAAA GCAAGTGATC 

GCCCACAAAG CCAAACTTGT GGCTTTAATA 

60 

TCTACACGGA CAGCAG6TAT GCTCTTGCCA 
CTGTTGACAT CTGCAGAGAA AGACCTAAGA 
65 TCTAACCGCC CAGGCATCCT AAAGAGCAAT 
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AGGCGGCGGA 


GCCGACACCA 


CGGCCACCGA 


3600 


TGAAGACCAG 


CCCTTCCCGG 


CTGTGCCGAA 


3660 


TGGAGAGACG 


CGCCCGCTGA 


TCCTTTGCGA 


3720 


CGGTTTCGCT 


AAATACTGGC 


AGGCGTTTCG 


3780 


CTGGGACTGG 


GTGGATCAGT 


CGCTGATTAA 


3840 


TTACGGCGGT 


GATTTTGGCG 


ATACGCCGAA 


3900 


CTTTGCCGAC 


CGCACGCCGC 


ATCCAGCGCT 


3960 


CCAGTTCCGT 


TTATCCGGGC 


AAACCATCGA 


4020 


CGATAACGAG 


CTCCTGCACT 


GGATGGTGGC 


4080 


AGTGCCTCTG 


GATGTCGCTC 


CACAAGGTAA 


4140 


GCCGGAGAGC 


GCCGGGCAAC 


TCTGGCTCAC 


4200 


ATGGTCAGAA 


GCCGGGCACA 


TCAGCGCCTG 


4260 


TGTGACGCTC 


CCCGCCGCGT 


CCCACGCCAT 


4320 


TTGCATCGAG 


CTGGGTAATA 


AGCGTTGGCA 


4380 


GATGTGGATT 


GGCGATAAAA 


AACAACTGCT 


4440 


ACCGCTGGAT 


AACGACATTG 


GCGTAAGTGA 


4500 


CGAACGCTGG 


AAGGCGGCGG 


GCCATTACCA 


4560 


AGATACACTT 


GCTGATGCGG 


TGCTGATTAC 


4620 


AACCTTATTT 


ATCAGCCGGA AAACCTACCG 


4680 


CGTTGATGTT 


GAAGTGGCGA 


GCGATACACC 


4740 


GCTGGCGCAG 


GTAGCAGAGC 


GGGTAAACTG 


4800 


CGACCGCCTT 


ACTGCCGCCT 


GTTTTGACCG 


4860 


CCCGTACGTC 


TTCCCGAGCG 


AAAACGGTCT 


4920 


CCCACACCAG 


TGGCGCGGCG 


ACTTCCAGTT 


4980 


GATGGAAACC 


AGCCATCGCC 


ATCTGCTGCA 


5040 


CGGTTTCCAT 


ATGGGGATTG 


GTGGCGACGA 


5100 


CCAGCTGAGC 
GGGCAGGGGG 


GCCGGTCGCT 
GATCCGAAGG 


ACCATTACCA 
CGG6GACAGC 


5160 
5220 


TAGGCCAGCA 


GCCTCCCTAA 


AGG6ACTTCA 


5280 


CAAGCTCTGT 


7UVATGGTAAA AAAATUU^G 


5340 


CTGTACAGAG 


CAATATACAG 


ACAAAGAGAA 


5400 


TGCTGTGGCT 


AAAAGAAATC 


AGATGGCAAA 


5460 


GATCCTGACA 


GTCTGAAGAC 


TATCAAGTTA 


5520 
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TAGACAAATT 


AAGACTGGTA 


AAAAAAACCC 


TGTATAAAAT 


AGTAAAAAuT 


oa aaaaapan 




AACTAGTCCT 


CTCATGAGAA GACAGACCTG ACATCTACTG 


A A n TV A*PTVf*<7V^ 


TTT a PTPP a a 


304U 


AAAATATGTG 


TATGAATACC 


TTCTAGTTTT 


TGTGAACGTT 




a*p a a a a fSPTT 


3 /UU 


TTCCTTGTAA AACGAGACTG 


ATCAGATAGT 


CATCAAGAAG 




rvinnX XXX Vi.«Vii 


O / DU 


AAGGTTCGGA 


GTGCCAAAAG 


CAATAGT6TC 


AGATAATGGT 


WW i VsWw 1 1 J. u 


TTfiPPPAGRT 




AAGTCAGGGT 


GTGGCCAAGT 


ATTTAGAGGT 


CAAATGAAAA 






O O w 


TCAGAGCTCA 


GGAAAGATAA 


AAAAGAATAA 


ATAAAACTCT 




TGACAAAATT 




AATCCTAGAG 


ACTGGCACAG 


ACTTACTTGG 


TACTCCTTCC 


PPT TCPPPTA 


XXX n\aiv\w X \9 




AGAATACTCC 


CTCTTGATTC 


GGTTTTACTC 


TTTTTAAGAT 




fiPTPPTATfiP 
u w X WW X n X \9 W 




CATCACTGTC 


TTAAATGATG 


TGTTTAAACC 


TATGTTGTTA 


xAAiAAiliAl 


PT* A T A T/^ T T A 
WlAx Aiol I A 




AGTTAAAAGG 


CTTGCAGGTG 


GTGCAGAAAG AAGTCTGGTC 




AwAu 1 u AAw A 


£1 on 


AGCTGGGTAC 


CCCAAGGACA 


TCTTACCAGT 


TCCAGCCAGA 


GATCTGATCT 


AwGATwwwwG 




GGTCGACCCG 


GGTCGACCCT 


GTGGAATGTG 


TGTCAGTTAG 


GGTGTGGAAA 


GTwwwwAGGw 


Q JUU 


TCCCCAGCAG 


GCAGAAGTAT 


GCAAAGCATG 


CATCTCAATT 


AGTCAGCAAC 


WAGGTGTGGA 


0 JoU 


AAGTCCCCAG 


GCTCCCCAGC 


AGGCAGAAGT 


ATGCAAAGCA 


rn/^OTV moniOTV TV 

TGCATCTCAA 


t 

rnn*TVO'POTVO/^TV 
TTAGTwAGwA 


£ il O A 


ACCATAGTCC 


CGCCCCTAAC 


TCCGCCCATC 


CCGCCCCTAA 


CTCCGCCCAG 


TTwwGwwwAT 


£>l O A 


TCTCCGCCCC 


ATGGCTGACT 


AATTTTTTTT 


ATTTATGCAG 


AGGCCGAGGC 


wGwwxwG6ww 


C C>l A 
DP4U 


TCTGAGCTAT 


TCCAGAAGTA 


GTGAGGAGGC 


TTTTTTGGAG 


GCCTAGGCTT 


*p*pppa a a aap 
1 X uwAAAAAo 


££nA 


CTTCACGCTG 


CCGCAAGCAC 


TCAGGGCGCA AGGGCTGCTA 


TV Tv/^f^ TV TV r^r^fc 


aapappTapa 
AAwAwV9 i AuA 


OOOU 


AAGCCAGTCC 


GCAGAAACGG 


TGCTGACCCC 


GGATGAATGT 


PBPPTZiPTCC 


/tPTATPTGGA 
VjW X riX W X vaun 




CAAGGGAAAA 


CGCAAGCGCA 


AAGAGAAAGC 


AGGTAGCTTG 


P A(lT^t^<^PTT 
Wriu 1 uoo W X X 


APATrJGPnAT 




AGCTAGACTG 


GGCGGTTTTA 


TGGACAGCAA 


GCGAACCGGA 


ATTGCCAGCT 


6GGGCGCCCT 


6840 


CTGGTAAGGT 


TGGGAAGCCC 


TGCAAAGTAA ACTGGATGGC 


TTTCTTGCCG 


CCAAGGATCT 


6900 


GATGGCGCAG 


GGGATCAAGA TCTGATCAAG AGACAGGATG 


AGGATCGTTT 


CGCATGATTG 


6960 


AACAAGATGG 


ATTGCACGCA 


GGTTCTCCGG 


CCGCTTGGGT 


GGAGAGGCTA 


TTCGGCTATG 


7020 


ACTGGGCACA ACAGACAATC 


GGCTGCTCTG ATGCCGCCGT 


GTTCCGGCTG 


TCAGCGCAGG 


7080 


GGCGCCCGGT 
AGGCAGCGCG 


TCTTTTTGTC 
GCTATCGTGG 


AAGACCGACC 
CTGGCCACGA 


TGTCCGGTGC 
CGGGCGTTCC 


CCTGAATGAA 
TTGCGCAGCT 


CTGCAGGACG 
GTGCTCGACG 


7140 

*T o ^^ A 
720U 


TTGTCACTGA 


AGCGGGAAGG 


GACTGGCTGC 


TATTGGGCGA 


AGTGCCGGGG 


CAGGATCTCC 


7260 


TGTCATCTCA 


CCTTGCTCCT 


GCC6AGAAAG 


TATCCATCAT 


GGCTGATGCA ATGC6GCGGC 


7320 


TGCATACGCT 


TGATCCGGCT 


ACCTGCCCAT 


TCGACCACCA AGCGAAACAT 


CGCATCGAGC 


7380 


GAGCACGTAC 


TCGGATGGAA 


GCCGGTCTTG 


TCGATCAGGA 


TGATCTGGAC 


GAAGAGCATC 


7440 


AGGGGCTCGC 


GCCAGCCGAA 


CTGTTCGCCA 


GGCTCAAGGC 


GCGCATGCCC 


GACGGCGAGG 


7500 


ATCTCGTCGT 


GACCCATGGC 


GATGCCTGCT 


TGCCGAATAT 


CATGGTGGAA 


AATGGCCGCT 


7560 
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TTTCT6GATT CATCGACTGT GGCCGGCTGG GTGTGGCGGA CCGCTATCAG GACATAGCGT 7620 

TGGCTACCCG TGATATTGCT GAAGAGCTTG GCGGCGAATG GGCTGACCGC TTCCTCGTGC 7680 

5 

TTTACGGTAT CGCCGCTCCC GATTCGCAGC GCATCGCCTT CTATCGCCTT CTTGACGAGT 7740 



TCTTCTGAGC GGGACTCTGG GGTTCGAAAT GACCGACCAA GCGACGCCCA ACCTGCCATC 7800 

10 ACGAGATTTC GATTCCACCG CCGCCTTCTA TGAAAGGTTG GGCTTCGGAA TCGTTTTCCG 7860 

GGACGGAATT CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT 7920 

GTTTGCCGGA TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC 7980 

15 

AGATACCAAA TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG 8040 

TAGCACCGCC TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG 8100 

20 ATAAGTCGTG TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT 8160 

CGGGCTGAAC GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC 8220 

TGAGATACCT ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG 8280 

25 

ACAGGTATCC GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG 8340 

GAAACGCCTG GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT 8400 

30 TTTTGTGATG CTC6TCAGGG G6GCGGAGCC TATGGAAAAA CGCCAGCAAC GCCGAGATGC 8460 

GCC6CCTCGA GTACACCTGC GTCATGCTGA GACCCTCAAG CCTCACTAAA AG6GTCCCTG 8520 

CCTAGTTCTG TTTACTAATC TGCCTTATTC TGTTTTTGTT CCCATGTTAA AGATAGAGTA 8580 

35 

AATGCAGTAT TCTCCACATA GAGATATAGA CTTCTGAAAT TCTAAGATTA GAATTATTTA 8640 

CAAGAAGAAG TGGGGAA 8657* 
40 (2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6359 base pairs 

(B) TYPE: nucleic acid 
45 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

50 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

55 TGAAGAATAA AAAATTACTG GCCTCTTGTG AGAACATGAA CTTTCACCTC GGAGCCCACC 60 

CCCTCCCATC TGGAAAACAT ACTTGAGAAA AACATTTTCT GGAACAACCA CAGAATGTTT 120 

CAACAGGCCA GATGTATTGC CAAACACAGG ATATGACTCT TTGGTTGAGT AAATTTGTGG 180 

60 

TTGTTAAACT TCCCCTATTC CCTCCCCATT CCCCCTCCCA GTTTGTGGTT TTTTCCTTTA 240 

AAAGCTTGTG AAAAATTTGA GTCGTCGTCG AGACTCCTCT ACCCTGTGCA AAGGTGTATG 300 

65 AGTTTCGACC CCAGAGCTCT GTGTGCTTTC TGTTGCTGCT TTATTTCGAC CCCAGAGCTC 360 
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TGGTCTGTGT GCTTTCATGT CGCTGCTTTA 



GTCTCAGTGT CTTCTTGGGT ACGC6GCTGT 
5 TCCCTCGAGG GTCTTTCATT TGGTACATGG 



CATTGGCCGG GAATTCGAAA ATCTTTCATT 



CCCAGAGGTC CTAGACCCAC TTAGAGGTAA 

10 

GTGTTCTGAT GTCTGTGTTC TGTTTCTAAG 



AC6CTCAGTG AGACCGCGCT CCGAGAGGGA 
15 GGTGTCCACC GTCCGTTCGC CCTGGGAGAC 



CGCCTGGTGG ACCCCTTTGA AGGCCAAGAG 



AGTCCCACCT CGTGCCCAGT TGCGAGATCG 

20 

CGAGATCGTG GGTTCGA6TC CCACCTCGCG 
ACCTCGTGTT TTGTTGCGAG ATCGTGGGTT 
25 CGTGGGTTCG AGTCCCACCT CGTGCAGAGG 



CTGATTCTTC TGGTTTCTCT TTTTGTCTTA 



TTTTTCTAAA AATGGGACAA TCTGTGTCCA 

30 

TTGGTAATTT TGTTTGTTTA CGTTTGTTTT 



TTGTTTTTGT TTGTGGTTTA CGGTTTCTGT 
35 CTTGGACTGA TGACTGACGA CTGTTTTTAA 



CTGTCAGATC CCTATGCTGA CCACTTCCTT 
AAGCTTCGAA TTCTGCAGTC GACGGTACCG 

40 

GTACGTAGCG GGGATCAATT CCGCCCCCCC 



TAAGGCCGGT GTGCGTTTGT CTATATGTTA 
45 GTGAGGGCCC GGAAACCTGG CCCTGTCTTC 



CTCGCCAAAG GAATGCAAGG TCTGTTGAAT 



TCTTGAAGAC AAACAACGTC TGTAGCGACC 

50 

GACAGGTGCC TCTGCGGCCA AAAGCCACGT 
CCCCAGTGCC ACGTTGTGAG TTGGATAGTT 



GTATTCAACA AGGGGCTGAA GGATGCCCAG 

55 

GGGCCTCGGT GCACATGCTT TACATGTGTT 



CCGAACCACG GGGACGTGGT TTTCCTTTGA 
60 CCATGGGTAA AGGAGAAGAA CTTTTCACAG 



GTGATGTTAA TGGGCACAAA TTTTCTGTCA 



GAAAACTTAC CCTTAAATTT ATTTGCACTA 

65 

TTGTCACTAC TTTCACTTAT GGTGTTCAAT 



TTAAATCT TA 


CCTTCTACAT 


TTTATGTATG 


42-0 


L^CUGGCsACTT 


GAGTGTCTGA 


GTGAGGGTCT 


4oO 




r*01V^JV TV m/^i"i' 

wiaAGAATCTT 


TCATTTGGTG 


34u 








Cf\t\ 

ouu 




Ffl fp fT> fW ff* 




OOU 


ip ^ m m ^ 7» 


ruGwAvaTTTC 


Au r r 1 TbCob 


Ton 


GTGCGGGG T G 


/-• TV m 7\ 7\ f>f T\ rn TV 

GATAAGGATA 


GACGTGTCCA 


n o f\ 


GTCCCAGGAG 


GAACAGGGGA 


GGATCAGGGA 


840 


ACCATTTGGG 


GTTGCGAGAT 


CGTGGGTTCG 


900 


TGGGTTC6A6 


TCCCACCTCG 


TGTTTTGTTG 


f\ ^ ^ 
960 


TCTGGTCACG 


GGATCGTGGG 


TTCGAGTCCC 


1020 


CGAGTCCCAC 


CTCGCGTCTG 


GTCACGGGAT 


t i\ r\ r\ 

1080 


GTCTCAATTG 


GCCGGCCTTA 


GAGAGGCCAT 


1140 


GTCTCGTGTC 


CGCTCTTGTT 


GTGACTACTG 


1200 


CTCCCCTTTC 


TCTGACTCTG 


GTTCTGTCGC 


1260 


TGTGAGTCGT 


CTATGTTGTC 


TGTTACTATC 


1320 


GTGTGTCTTG 


TGTGTCTCTT 


TGTGTTCAGA 


1380 


GTTATGCCTT 


CTAAAATAAG 


M J ^tii TV TV TV TV TV 

CCTAAAAATC 


1440 


TCAGATCAAC 


AGCTGCCCTT 


ACTCGAGCTC 


1 con 


CGGCCGCTAA 


CTAATAGCCC 


AT TCTCCAAb 


130U 


CCTAACGTTA 


CTGGCCGAAG 


CCGCTTGGAA 


1620 


TTTTGCACCA 


TATTGCCGTC 


TTTTGGCAAT 


1680 


TTGACGAGCA 


TTCCTAGGGG 


TCTTTCCCCT 


1740 


GTCGTGAAG6 


AAGCAGTTCC 


TCTGGAAGCT 


1800 


CTTTGCAGGC 


AGCGGAACCC 


CCCACCTGGC 


1860 


GTATAAGATA 
GTGGAAAGAG 


CACCTGCAAA 
TCAAATGGCT 


GGCGGCACAA 
CTCCTCAAGC 


1920 
1980 


AAGGTACCCC 


ATTGTATGGG 


ATCTGATCTG 


2040 


TAGTCGAGGT 


TAAAAAAACG 


TCTAGGCCCC 


2100 


AAAACACGAT 


ACGG6ATCCA 


CCGGTCGCCA 


2160 


GAGTTGTCCC AATTCTTGTT 6AATTAGATG 


2220 


GTGGAGAGGG 


TGAAGGTGAT 


GCAACATACG 


2280 


CTGGAAAACT 


ACCTGTTCCA 


TGGCCAACAC 


2340 


GCTTTTCAAG 


ATACCCAGAT 


CATATGAAAC 


2400 
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GGCATGACTT 


TTTCAAGAGT 


GCCATGCCCG AAGGTTATGT ACAGGAAAGA ACTATATTTT 


2460 


TCAAAGATGA 


CGGGAACTAC 


AAGACACGTG CTGAAGTCAA 


GTTTGAAGGT 


GATACCCTTG 


2520 


TTAATAGAAT 


CGAGTTAAAA 


GGTATTGATT TTAAAGAAGA 


TGGAAACATT 


CTTGGACACA 


2580 


AATTGGAATA 


CAACTATAAC 


TCACACAATG TATACATCAT 


GGCAGACAAA 


CAAAAGAATG 


2640 


GAACCAAAGT 


TAACTTCAAA 


ATTAGACACA ACATTGAAGA 


TGGAAGCGTT 


CAACTAGCAG 


2700 


ACCATTATCA ACAAAATACT 


CCAATTGGCG ATGGCCCTGT 


CCTTTTACCA 


GACAACCATT 


2760 


ACCTGTCCAC 


ACAATCTGCC 


CTTTCGAAAG ATCCCAACGA 


AAAGAGAGAC 


CACATGGTCC 


2820 


TTCTTGAGTT TGTAACAGCT GCT6GGATTA CACATGGCAT GGATGAAQTA TACAAGTCCG 


2880 


GATCTAGATA ACTGTATCGA 


TGGATCCGAA GGCGGGGACA 


GCAGTGCAGT 


GGTGGACAGA 


2940 


AAGCAAGTGA 


TCTAGGCCAG 


CAGCCTCCCT AAAGGGACTT 


CAGCCCACAA AGCCAAACTT 


3000 


GTGGCTTTAA 


TACAAGCTCT 


GTAAATGGTA AAAAAAAAAA 


AGTCTACACG 


GACAGCAGGT 


3060 


ATGCTCTTGC 


CACTGTACAG 


AGCAATATAC AGACAAAGAG 


AACTGTTGAC 


ATCTGCAGAG 


3120 


AAAGACCTAA 


GATGCTGTGG 


CTAAAAGAAA TCAGATGGCA 


AATCTAACCG 


CCCAGGCATC 


3180 


CTAAAGAGCA ATGATCCTGA 


CAGTCTGAAG ACTATCAAGT 


TATAGACAAA 


TTAAGACTGG 


3240 


TAAAAAAAAC 


CCTGTATAAA 


ATAGTAAAAA CTGAAAAAAG 


AAAACTAGTC 


CTCTCATGAG 


3300 


AAGACAGACC 


TGACATCTAC 


TGAAAAATAG ACTTTACTGG 


AAAAAATATG 


TGTATGAATA 


3360 


CCTTCTAGTT 


TTTGTGAACG 


TTCTCAAGAT GGATAAAAGC 


TTTTCCTTGT 


TV Tin 1V/^^5\/*1VO 


3420 


TGATCAGATA 


GTCATCAAGA 


AGATTGTTAA AGAAAATTTT 


CCAAGGTTCG 


GAGTGCCAAA 


3480 


AGCAATA6TG 


TCAGATAATG 


GTCCTGCCTT TGTTGCCCAG 


GTAAGTCAGG 


GTGTGGCCAA 


3540 


GTATTTAGAG 


GTCAAATGAA 


AATTCCATTG TGTGTACAGA 


CCTCAGAGCT 


CAGGAAAGAT 


3600 


AAAAAAGAAT 


AAATAAAACT 


CTAAACAGAG CTTGACAAAA 


TTAATCCTAG 


AGACTGGCAC 


3660 


AGACTTACTT 


GGTACTCCTT 


CCCCTTGCCC TATTTAGAAC 


TGAGAATACT 


CCCTCTTGAT 


3720 


TCGGTTTTAC 


TCTTTTTAAG 


ATCCTTTATG GGGCTCCTAT 


GCCATCACTG 


TCTTAAATGA 


3780 


TGTGTTTAAA 


CCTATGTTGT 


TATAATAATG ATCTATATGT 


TAAGTTAAAA 


GGCTTGCAGG 


3840 


TGGTGCAGAA 
CATCTTACCA 


AGAAGTCTGG 
GTTCCAGCCA 


TCACAACTGG CTACAGTGAA 
GAGATCTGAT CTACGATCCC 


CAAGCTGGGT 
CGGGTCGACC 


ACCCCAAGGA 
CGGGTCGACC 


3900 
3960 


CTGTGGAATG 


TGTGTCAGTT 


AGGGTGTGGA AAGTCCCCAG 


GCTCCCCAGC 


AGGCAGAAGT 


4020 


ATGCAAAGCA 


TGCATCTCAA 


TTAGTCAGCA ACCAGGTGTG 


GAAAGTCCCC 


AGGCTCCCCA 


4080 


GCAGGCAGAA 


GTATGCAAAG 


CATGCATCTC AATTAGTCAG 


CAACCATAGT 


CCCGCCCCTA 


4140 


ACTCCGCCCA 


TCCCGCCCCT AACTCCGCCC AGTTCCGCCC ATTCTCCGCC 


CCATG6CTGA 


4200 


CTAATTTTTT 


TTATTTATGC 


AGAGGCCGAG GCCGCCTCGG 


CCTCTGAGCT 


ATTCCAGAAG 


4260 


TA6TGAGGAG 


GCTTTTTTGG 


AGGCCTAGGC TTTTGCAAAA AGCTTCACGC 


TGCCGCAAGC 


4320 


ACTCAGGGCG 


CAAGGGCTGC 


TAAAGGAAGC GGAACACGTA 


GAAAGCCAGT 


CCGCAGAAAC 


4380 
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GGTGCTGACC CCGGATGAAT GTCAGCTACT 
CAAAGAGAAA 6CAGGTAGCT TGCAGTGGGC 
S TATGGACAGC AAGCGAACCG GAATTGCCAG 
CCTGCAAAGT AAACTGGATG GCTTTCTTGC 
GATCTGATCA A6AGACAGGA TGAGGATCGT 

10 

CAGGTTCTCC GGCCGCTTGG GTGGAGAGGC 
TCGGCTGCTC TGATGCCGCC GTGTTCCGGC 
15 TCAAGACCGA CCTGTCCGGT GCCCTGAATG 
GGCTGGCCAC GACGGGCGTT CCTTGCGCAG 
G6GACTGGCT GCTATT6GGC GAAGTGCCGG 

20 

CTGCCGAGAA AGTATCCATC ATGGCTGATG 
CTACCTGCCC ATTCGACCAC CAAGCGAAAC 
25 AAGCCGGTCT TGTCGATCAG GATGATCTGG 
AACTGTTCGC CAGGCTCAAG GCGCGCATGC 
GCGATGCCTG CTTGCCGAAT ATCATGGTGG 

30 

GTGGCCGGCT GGGTGTGGCG GACCGCTATC 
CTGAAGAGCT TGGCGGCGAA TGGGCTGACC 
35 CCGATTCGCA GCGCATCGCC TTCTATCGCC 
GGGGTTCGTUV ATGACCGACC AAGCGACGCC 
CGCCGCCTTC TATGAAAGGT TGGGCTTCGG 

40 

GCTGCTTGCA AACAAAAAAA CCACCGCTAC 

TACCAACTCT TTTTCCGAAG GTAACTGGCT 

45 TTCTAGTGTA GCCGTAGTTA GGCCACCACT 

TCGCTCTGCT AATCCTGTTA CCAGTGGCTG 

GGTTGGACTC AAGACGATAG TTACCGGATA 
SO CGTGCACACA GCCCAGCTTG GAGCGAACGA 

AGCATT6AGA AAGCGCCACG CTTCCCGAAG 

GCAGGGTCGG AACAGGAGAG CGCACGAGGG 

55 

ATAGTCCTGT CGGGTTTCGC CACCTCTGAC 
GGGGGCGGAG CCTATGGAAA AACGCCAGCA 
60 GCGTCATGCT GAGACCCTCA AGCCTCACTA 
TCTGCCTTAT TCTGTTTTTG TTCCCATGTT 
TAGAGATATA GACTTCTGAA ATTCTAAGAT 

65 

(2) INFORMATION FOR SEQ ID NO: 16 



GGGCTATCXG GACAAGGGAA AACGCAAGCG 444-0 

TTACATGGCG ATAGCTAGAC TGGGCGGTTT 4500 

CTGGGGCGCC CTCTGGTAAG GTTGGGAAGC 4560 

CGCCTUIGGAT CTGATGGCGC AGGGGATCAA 4620 

TTCGCATGAT TGAACAAGAT GGATTGCACG 4680 

TATTCGGCTA TGACTGGGCA CAACAGACAA 4740 

TGTCAGCGCA GGGGCGCCCG GTTCTTTTTG 4800 

AACTGCAGGA CGAGGCAGCG CGGCTATCGT 4860 

CTGTGCTCGA CGTTGTCACT GAAGCGGGAA 4 920 

GGCAGGATCT CCTGTCATCT CACCTTGCTC 4980 

CAATGCG6CG GCTGCATACG CTTGATCCGG 5040 

ATCGCATCGA GCGAGCACGT ACTCGGATGG 5100 

ACGAAGAGCA TCAGGGGCTC GCGCCAGCCG 5160 

CCGACGGCGA GGATCTCGTC GTGACCCATG 5220 

AAAATGGCCG CTTTTCTGGA TTCATCGACT 5280 

AGGACATAGC GTTGGCTACC CGT6ATATTG 5340 

GCTTCCTCGT GCTTTACGGT ATCGCCGCTC 5400 

TTCTTGACGA GTTCTTCTGA GCGG6ACTCT 5460 

CAACCTGCCA TCACGAGATT TCGATTCCAC 5520 

AATCGTTTTC CGGGACGGAA TTCGTAATCT 5580 

CAGCGGTGGT TTGTTTGCCG GATCAAGAGC 5640 

TCAGCAGAGC GCAGATACCA AATACTGTCC 5700 

TCAAGAACTC TGTAGCACCG CCTACATACC 5760 

CTGCCAGTGG CGATAAGTCG TGTCTTACCG 5820 

AGGCGCAGCG GTCGGGCTGA ACGGGGGGTT 5880 

CCTACACCGA ACTGAGATAC CTACAGCGTG 5940 

GGAGAAAGGC GGACAGGTAT CCGGTAAGCG 6000 

AGCTTCCAGG GGGAAACGCC TGGTATCTTT 6060 

TTGAGCGTCG ATTTTTGTGA TGCTCGTCAG 6120 

ACGCCGAGAT GCGCCGCCTC GAGTACACCT 6180 

AAAGGGTCCC TGCCTAGTTC TGTTTACTAA 6240 

AAAGATAGAG TAAATGCAGT ATTCTCCACA 6300 

TAGAATTATT TACAAGAAGA AGTGGGGAA 6359 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6891 base pairs 

(B) TYPE: nucleic acid 

5 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



15 


TGAAGAATAA 


AAAATTACTG 


GCCTCTTGTG 


AGAACATGAA 










CCCTCCCATC 


TGGAAAACAT 


ACTTGAGAAA AACATTTTCT 






ion 


20 


CAACAGGCCA 


GATGTATTGC 


CAAACACAGG 


ATATGACTCT 


TTGGTTuAla 1 




inn 


TTGTTAAACT 


TCCCCTATTC 


CCTCCCCATT 


CCCCCTCCCA 


GTTTGTGGTT 


1 i i Iv^Ui i 






AAAGCTTGTG 


AAAAATTTGA 


GTCGTCGTCG 


AGACTCCTCT 


AuCCTGTGGA 


ft a ^^^T^T ft 
AAlauIuXAXb 


inn 


25 


AGTTTCGACC 


CCAGAGCTCT 


GTGTGCTTTC 


TGTTGCTGCT 


1 lAL i iUuAU 








TGGTCTGTGT 


GCTTTCATGT 


CGCTGCTTTA 


TTAAATCTTA 




rp rp *p TV »7» p »n j\ fT» p 




30 


GTCTCAGTGT 


CTTCTTGGGT 


ACGCGGCTGT 


CCCGGGACTT 




o XuAuuu X w X 




TCCCTCGAGG 


GTCTTTCATT 


TGGTACATGG 


GCCGGGAATT 


CGAGAATCTT 


ICAI X XbVjiVa 






CATTGGCCGG GAATTCGAAA ATCTTTCATT 


TGGTGCATTG 


GCCGGGAAAC 




t3UU 


35 


CCCAGAGGTC 


CTAGACCCAC 


TTAGAGGTAA GATTCTTTGT 


TCTGTTTTGG 


TCTGATGTCT 


oou 




GTGTTCTGAT 


GTCTGTGTTC 


TGTTTCTAAG 


TCTGGTGCGA 


TCGCAGTTTC 


AGTTTTGCGG 


\ £.\J 


40 


ACGCTCAGTG 


AGACCGCGCT 


CCGAGAGGGA 


GTGCGGGGTG 


GATAAGGATA 


GACGTGTCCA 


/oU 


GGTGTCCACC 


GTCCGTTCGC 


CCTGGGAGAC 


GTCCCAGGAG 


GAACAGGGGA 


GGATCAGGGA 


840 




CGCCTGGTGG 


ACCCCTTT6A 


AGGCCAAGAG 


ACCATTTGGG 


GTTGCGAGAT 


CGTGGGTTCG 


900 


45 


AGTCCCACCT 


CGTGCCCAGT 


TGCGA6ATCG 


TGGGTTCGAG 


TCCCACCTCG 


TGTTTTGTTG 


960 




CGAGATCGTG 


GGTTCGAGTC 


CCACCTCGCG 


TCTGGTCACG 


GGATCGTGGG 


TTCGAGTCCC 


1020 


50 


ACCTCGTGTT 


TTGTTGC6AG 


ATCGTGGGTT 


CGAGTCCCAC 


CTCGCGTCTG 


GTCACGGGAT 


1080 


CGTGGGTTCG 


AGTCCCACCT 


CGTGCAGAGG 


GTCTCAATTG 


GCCGGCCTTA 


GAGAGGCCAT 


1140 




CTGATTCTTC 


TGGTTTCTCT 


TTTTGTCTTA 


GTCTCGTGTC 


CGCTCTTGTT 


GTGACTACTG 


1200 


55 


TTTTTCTAAA AATGGGACAA 


TCTGTGTCCA 


CTCCCCTTTC 


TCTGACTCTG 


GTTCTGTCGC 


1260 




TTGGTAATTT 


TGTTTGTTTA 


CGTTTGTTTT 


TGTGAGTCGT 


CTATGTTGTC 


TGTTACTATC 


1320 


60 


TTGTTTTTGT 


TTGTGGTTTA 


CGGTTTCTGT 


GTGTGTCTTG 


TGTGTCTCTT 


TGTGTTCAGA 


1380 


CTTGGACTGA TGACTGACGA CTGTTTTTAA GTTATGCCTT CTAAAATAAG CCTAAAAATC 


1440 




CTGTCAGATC 


CCTATGCTGA 


CCACTTCCTT 


TCAGATCAAC 


AGCTGCCCTT 


ACTCGAGCTC 


1500 


65 


AAGCTTCGAA TTCTGCAGTC 


GACGGTACCG 


CGGGGATCAA 


TTCCGCCCCC 


CCCCTAACGT 


1560 
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TACTGGCCGA AGCCGCTTGG AATAAGGCCG 
CATATTGCCG TCTTTTGGCA ATGTGAGGGC 
5 CATTCCTAGG GGTCTTTCCC CTCTCGCCAA 
GGAAGCAGTT CCTCTGGAA6 CTTCTTGAAG 
GCAGCGGAAC CCCCCACCTG GCGACAGGTG 

10 

TACACCTGCA AAGGCGGCAC AACCCCAGTG 
AGTCAAATGG CTCTCCTCAA GCGTATTCAA 
15 CCATTGTATG GGATCTGATC TGGGGCCTCG 
GTTAAAAAAC GTCTAGGCCC CCCGAACCAC 
GCGGGATCAA TTCCGCCCCC CCCCTAACGT 

20 

GT6TGCGTTT GTCTATATGT TATTTTCCAC 
CCGGAAACCT GGCCCTGTCT TCTTGACGAG 
25 AGGAATGCAA GGTCTGTTGA ATGTCGTGAA 
ACAAACAACG TCTGTAGCGA CCCTTTGCAG 
CCTCTGCGGC CAAAAGCCAC GTGTATAAGA 

30 

CCACGTTGTG AGTTGGATAG TTGTGGAAAG 
CAAGGGGCTG AAGGATGCCC AGAAG6TACC 
35 GTGCACATGC TTTACATGTG TTTAGTCGAG 
CGGGGACGTG GTTTTCCTTT GAAAAACACG 
AAAGGAGAAG AACTTTTCAC AGGAGTTGTC 

40 

AATGGGCACA AATTTTCTGT CAGTGGAGAG 

ACCCTTAAAT TTATTTGCAC TACTGGAAAA 

45 ACTTTCACTT ATGGTGTTCA ATGCTTTTCA 

TTTTTCAAGA GTGCCATGCC CGAAGGTTAT 
GACGGGAACT ACAAGACACG TGCTGAAGTC 

50 ATCGAGTTAA AAGGTATTGA TTTTAAAGAA 

TACAACTATA ACTCACACAA TGTATACATC 

GTTAACTTCA AAATTAGACA CAACATTGAA 

55 

CAACAAAATA CTCCAATTGG C6ATGGCCCT 
ACACAATCTG CCCTTTCGAA AGATCCCAAC 
60 TTTGTAACAG CTGCTGGGAT TACACATGGC 
TAACTGTATC GATG6ATCCG AAGGCGGGGA 
GATCTAGGCC AGCAGCCTCC CTAAAGGGAC 

65 

AATACAAGCT CTGTAAATGG TAAAAAAAAA 



GTGTGCGTTT 


GTCTATATGT 


TATTTTCCAC 


1620 


CCGGAAACCT 


GGCCCTGTCT 


TCTTGACGAG 


1680 


AGGAATGCAA 


GGTCTGTTGA 


ATGTCGTGAA 


1740 


ACAAACAACG 


TCTGTAGCGA 


CCCTTTGCAG 


1800 


CCTCTGCGGC 


CAAAAGCCAC 


GTGTATAAGA 


1860 


CCACGTTGTG 


AGTTGGATAG 


TTGTGGAAAG 


1920 


CAAGGGGCTG 


AAGGATGCCC 


AGAAGGTACC 


1980 


GTGCACATGC 


TTTACATGTG 


TTTAGTCGAG 


2040 


GGGGACGTGG 


TTTTCCTTTG 


AAAAACACGA 


2100 

«i» J* V W 


TACTGGCCGA 


AGCCGCTTGG 


AATAAGGCCG 


2160 


CATATTGCCG 


TCTTTTGGCA 


ATGTGAGGGC 

V9 A w«*www w 


2220 


CATTCCTAGG 


GGTCTTTCCC 


CTCTCGCCAA 

w X W ^ W w\rf wfV^ 


2280 


GGAAGCAGTT 


CCTCTGGAAG 


TTTrTTGAAG 

w ^ X V i> X wrWV7 


2340 


GCAGCGGAAC 


CCCCCACCTG 


CrGACAGGTG 

w w\3cX^M%w w X w 


2400 


TACACCTGCA AAGGCGGCAC 


AACCCC AG TG 

J^niww w X w 


2460 


AGTCAAATGG 


CTCTCCTCAA 


GCGTATTCAA 

www ••aX X W*V* 


2520 


CCATTGTATG 


GGATCTGATC 


TGGGGCCTCG 


2580 

A» w W 


GTTAAAAAAA 


CGTCTAGGCC 


CCCCGAACCA 


2640 


ATACGGGATC 


CACCGGTCGC 


CACCATGGGT 


2700 


CCAATTCTTG 


TTGAATTAGA 


TGGTGATGTT 


2760 


GGTGAAGGTG 


ATGCAACATA 


CGGAAAACTT 


2820 


CTACCTGTTC 


CATGGCCAAC 


ACTTGTCACT 


2880 


AGATACCCAG 


ATCATATGAA 


ACGGCATGAC 


2940 


GTACAGGAAA 
AAGTTTGAA6 


GAACTATATT 
GTGATACCCT 


TTTCAAAGAT 
TGTTAATAGA 


3000 
3060 


GATGGAAACA 


TTCTTGGACA 




3120 


ATGGCAGACA AACAAAAGAA 




J xo v/ 


GATGGAAGCG 


TTCAACTAGC 


nVan^w/ii XAX 




GTCCTTTTAC 


CAGACAACCA 


TTACCTGTCC 


3300 


GAAAAGAGAG 


ACCACATGGT 


CCTTCTTGAG 


3360 


ATGGATGAAC 


TATACAAGTC 


CGGATCTAGA 


3420 


CAGCAGTGCA 


GTGGTGGACA 


GAAAGCAAGT 


3480 


TTCAGCCCAC 


AAAGCCAAAC 


TTGTGGCTTT 


3540 


AAAGTCTACA 


CGGACAGCAG 


GTATGCTCTT 


3600 
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GCCACTGTAC AGAGCAATAT ACAGACAAAG 



AAGATGCTGT GGCTAAAAGA AATCAGATGG 

5 

CAATGATCCT GACAGTCTGA AGACTATCAA 
ACCCTGTATA AAATAGTAAA AACTGAAAAA 
10 CCTGACATCT ACTGAAAAAT AGACTTTACT 



TTTTTGTGAA CGTTCTCAAG ATGGATAAAA 



TAGTCATCAA GAAGATTGTT AAAGAAAATT 

15 

TGTCAGATAA TGGTCCTGCC TTTGTTGCCC 



AGGTCAAATG AAAATTCCAT TGTGTGTACA 
20 ATAAATAAAA CTCTAAACAG ACCTTGACAA 



TTGGTACTCC TTCCCCTTGC CCTATTTAGA 
ACTCTTTTTA AGATCCTTTA TGGGGCTCCT 

25 

AACCTATGTT GTTATAATAA TGATCTATAT 



AAAGAAGTCT GGTCACAACT GGCTACAGTG 
30 CAGTTCCAGC CAGAGATCTG ATCTACGATC 



TGTGTGTCAG TTAGGGTGTG 6AAAGTCCCC 



CATGCATCTC AATTAGTCAG CAACCAGGTG 

35 

AAGTATGCAA AGCATGCATC TCAATTAGTC 



CATCCCGCCC CTAACTCCGC CCAGTTCCGC 
40 TTTTATTTAT GCAGAGGCCG AGGCCGCCTC 



AGGCTTTTTT GGAGGCCTAG GCTTTTGCAA 
CGCAAGGGCt GCTAAAGGAA GCGGAACACG 

45 

CCCCGGATGA ATGTCAGCTA CTGGGCTATC 
AAGCAGGTAG CTTGCAGTGG 6CTTACATGG 

GCAAGCGAAC CGGAATTGCC AGCTGGGGCG 

50 

GTAAACTGGA T6GCTTTCTT GCCGCCAAGG 



CAAGAGACAG GATGAGGATC GTTTCGCATG 
55 CCGGCCGCTT GGGTGGAGAG GCTATTCGGC 



TCTGATGCCG CCGTGTTCCG GCTGTCAGCG 



GACCTGTCCG GTGCCCTGAA TGAACTGCAG 

60 

ACGACGGGCG TTCCTTGCGC AGCTGTGCTC 



CTGCTATTGG GCGAAGTGCC GGGGCAGGAT 
65 AAAGTATCCA TCATGGCTGA TGCAATGCGG 



AGAACTGTTG 


ACATCTGCAG 


AGAAAGACCT 


3660 


CAAATCTAAC 


CGCCCAGGCA 


TCCTAAAGAG 


3720 


GTTATAGACA AATTAAGACT 


GGTAAAAAAA 


3780 


AGAAAACTAG 


TCCTCTCATG 


AGAAGACAGA 


3840 


GGAAAAAATA 


TGTGTATGAA 


TACCTTCTAG 


3900 


GCTTTTCCTT 


GTAAAACGAG 


ACTGATCAGA 


3960 


TTCCAAGGTT 


CGGAGTGCCA 


AAAGCAATAG 


4020 


AGGT7VAGTCA 


GGGTGTGGCC 


AAGTATTTAG 


4080 


6ACCTCAGAG 


CTCAGGAAAG 


ATAAAAAAGA 


4140 


AATTAATCCT 


AGAGACTGGC 


ACAGACTTAC 


4200 


ACTGAGAATA 


CTCCCTCTTG 


ATTCGGTTTT 


4260 


ATGCCATCAC 


TGTCTTAAAT 


GATGTGTTTA 


4320 


GTTAAGTTAA 


AAGGCTTGCA 


GGTGGTGCAG 


4380 


AACAAGCTGG 


GTACCCCAAG 


GACATCTTAC 


4440 


CCCGGGTCGA CCCGGGTCGA 


CCCTGTGGAA 


4500 


AGGCTCCCCA 


GCAGGCAGAA 


GTATGCAAAG 


4560 


TGGAAAGTCC 


CCAGGCTCCC 


CAGCAGGCAG 


4620 


AGCAACCATA GTCCCGCCCC 


TAACTCCGCC 


4680 


CCATTCTCCG 


CCCCATGGCT 


GACTAATTTT 


4740 


GGCCTCTGAG 


CTATTCCAGA 


AGTAGTGAGG 


4800 


AAAGCTTCAC 


GCTGCCGCAA 


GCACTCAGGG 


4860 


TAGAAAGCCA 


GTCCGCAGAA 


ACGGTGCTGA 


4920 


TGGACAAGGG 
CGATAGCTAG 


AAAACGCAAG 
ACTGGGCGGT 


CGCAAAGAGA 
TTTATGGACA 


4 980 
5040 


CCCTCTGGTA 


AGGTTGGGAA 


GCCCTGCAAA 


5100 


ATCTGATGGC 


GCAGGGGATC 


AAGATCTGAT 


5160 


ATTGAACAAG 


ATGGATTGCA 


CGCAGGTTCT 


5220 


TATGACTGGG 


CACAACAGAC 


AATCGGCTGC 


5280 


CAG6GGC6CC 


CGGTTCTTTT 


TGTCAAGACC 


5340 


GACGAGGCAG 


CGCGGCTATC 


GTGGCTGGCC 


5400 


GACGTTGTCA 


CTGAAGCGGG 


AAGGGACTGG 


5460 


CTCCTGTCAT 


CTCACCTTGC 


TCCTGCCGAG 


5520 


CGGCTGCATA 


CGCTTGATCC 


GGCTACCTGC 


5580 
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10 



CCATTCGACC ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GG7AGCCGGT 5640 

CTTGTCGATC AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC 5700 

GCCAGGCTCA AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC 5760 

TGCTTGCCGA ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG 5820 

CTGGGTGTGG CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG 5880 

CTTGGCGGCG AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG 5940 

CAGCGCATCG CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG 6000 

15 AAATGACCGA CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT 6060 

TCTATGAAAG GTTGGGCTTC GGAATCGTTT TCCGGGACGG AATTCGTAAT CTGCTGCTTG 6120 

CAAACAAAAA AACCACCGCT ACCAGCGGTG GTTTGTTTGC CGGATCAAGA GCTACCAACT 6180 

20 

CTTTTTCCGA AGGTAACTGG CTTCAGCAGA GCGCAGATAC CAAATACTGT CCTTCTAGTG 6240 

TAGCCGTAGT TAGGCCACCA CTTCAAGAAC TCTGTAGCAC CGCCTACATA CCTCGCTCTG 6300 

25 CTAATCCTGT TACCAGTGGC TGCTGCCAGT GGCGATAAGT CGTGTCTTAC CGGGTTGGAC 6360 

TCAAGACGAT AGTTACCGGA TAAGGCGCAG CGGTCGGGCT GAACGGGGGG TTCGTGCACA 6420 

CA6CCCAGCT TGGAGCGAAC GACCTACACC GAACTGAGAT ACCTACAGCG TGAGCATTGA 6480 

30 

GA/^GCGCCA CGCTTCCCGA AGGGAGAAAG GCGGACAGGT ATCCGGTAAG CGGCAGGGTC 6540 

GGAACAGGAG AGCGCACGAG 6GAGCTTCCA GG6GGAAACG CCTGGTATCT TTATAGTCCT 6600 

35 GTCGGGTTTC GCCACCTCTG ACTTGAGCGT CGATTTTTGT GATGCTCGTC AGGGGGGCGG 6660 

AGCCTATGGA AAAACGCCAG CAACGCCGAG ATGCGCCGCC TCGAGTACAC CTGCGTCATG 6720 

CTGAGACCCT CAAGCCTCAC TAAAAG6GTC CCTGCCTAGT TCTGTTTACT AATCTGCCTT 6780 

ATTCTGTTTT TGTTCCCATG TTAAAGATAG AGTAAATGCA GTATTCTCCA CATAGAGATA 6840 

TAGACTTCTG AAATTCTAAG ATTAGAATTA TTTACAAGAA GAAGTGGGGA A 6891 



40 



45 (2) INFORMATION FOR SEQ ID NO: 17: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6321 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
50 (D) TOPOLOGY: linear 



55 



60 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

TGAAGAATAA AAAATTACTG GCCTCTTGTG AGAACATGAA CTTTCACCTC GGAGCCCACC 60 

CCCTCCCATC TGGAAAACAT ACTTGAGAAA AACATTTTCT GGAACAACCA CAGAATGTTT 120 

CAACAGGCCA GATGTATTGC CAAACACAGG ATATGACTCT TTGGTTGAGT AAATTTGTGG 180 

65 TTGTTAAACT TCCCCTATTC CCTCCCCATT CCCCCTCCCA GTTTGTGGTT TTTTCCTTTA 240 
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AAAGCTTGTG AAAAATTTGA GTCGTCGTCG 
AGTTTCGACC CCAGAGCTCT GTGTGCTTTC 
5 T6GTCTGTGT GCTTTCATGT CGCTGCTTTA 
GTCTCAGTGT CTTCTTGGGT ACGCGGCTGT 
TCCCTCGAGG GTCTTTCATT TGGTACATGG 

10 

CATTGGCCGG GAATTCGAAA ATCTTTCATT 
CCCAGAGGTC CTAGACCCAC TTAGAGGTAA 
15 GTGTTCTGAT GTCTGTGTTC TGTTTCTAAG 
ACGCTCAGTG A6ACCGCGCT CCGAGAGGGA 
GGTGTCCACC GTCCGTTCGC CCTGGGAGAC 

20 

CGCCTGGTGG ACCCCTTTGA AGGCCAAGAG 
AGTCCCACCT CGTGCCCAGT TGCGAGATCG 
25 CGAGATCGTG GGTTCGAGTC CCACCTCGCG 
ACCTCGTGTT TTGTTGCGAG ATCGTGGGTT 
CGTGGGTTCG AGTCCCACCT CGTGCAGAGG 

30 

CTGATTCTTC TGGTTTCTCT TTTTGTCTTA 
TTTTTCTAAA AATGGGACAA TCTGTGTCCA 
35 TTGGTAATTT TGTTTGTTTA CGTTTGTTTT 
TTGTTTTTGT TTGTGGTTTA CGGTTTCTGT 
CTTGGACTGA TGACTGACGA CTGTTTTTAA 

40 

CTGTCAGATC CCTATGCTGA CCACTTCCTT 
AAGCTTCGAA TTCTGCAGTC GACGGTACCG 
45 TACTGGCCGA AGCCGCTTGG AAT7\AGGCCG 
CATATTGCCG TCTTTTGGCA ATGTGAGGGC 
CATTCCTAGG 6GTCTTTCCC CTCTCGCCAA 

50 

GGAAGCAGTT CCTCTGGAAG CTTCTTGAAG 
GCAGCGGAAC CCCCCACCTG GCGACAGGTG 
55 TACACCTGCA AAGGCGGCAC AACCCCAGTG 
AGTCAAATGG CTCTCCTCAA GCGTATTCAA 
CCATTGTATG GGATCTGATC TGGGGCCTCG 

60 

GTTAAAAAAA CGTCTAGGCC CCCCGAACCA 
ATACGGGATC CACCGGTCGC CACCATGGGT 
65 CCAATTCTTG TTGAATTAGA TGGTGATGTT 



AGACTCCTCT 


ACCCTGTGCA 


AAGGTGTATG 


300 


TGTTGCTGCT 


TTATTTCGAC 


CCCAGAGCTC 


360 


TTAAATCTTA 


CCTTCTACAT 

WW * X w * »*w*» ♦ 


TTTATGTATG 

A A A A A * * S # * ^# 


420 


CCCGGGACTT 


GAGTGTCTGA 


GTGAGGGTCT 


480 


GCCGGGAATT 


CGAGAATCTT 


TCATTTGGTG 


540 


TGGTGCATTG 


GCCGGGAAAC 


AGCGCGACCA 


600 


GATTCTTTGT 


TCTGTTTTGG 

A A A A A A KJ^^ 


TCTGATGTCT 


660 


TCTGGTGCGA 


TCGCAGTTTC 

A AAA x^ 


AGTTTTGCGG 


720 


GTGCGGGGTG 


GATAAGGATA 

^#AA *4HrA\9^l«V*A ^m 


6ACGTGTCCA 


780 


GTCCCAGGAG 


GAACAGGGGA 


GGATCAGGGA 


840 


ACCATTTGGG 


GTTGCGAGAT 


CGTGGGTTCG 


900 


TGGGTTCGAG 


TCCCACCTCG 


TGTTTTGTTG 


960 


TCTGGTCACG 


6GATCGTGGG 


TTCGAGTCCC 


1020 


CGAGTCCCAC 


CTCGCGTCTG 


GTCACGGGAT 


1080 


GTCTCAATTG 


GCCGGCCTTA 


GAGAGGCCAT 


1140 


GTCTCGTGTC 


PGCTCTTGTT 

www X W X X W X X 


GTGACTACTG 


1200 


wAWwwwA X JL w 


TCTGACTCTG 

X w X wnw X w X w 


GTTCTGTCGC 


1260 




^xnXwX x\3X\^ 


TGTTACTATC 


1320 

A W A* W 




XUXVSXwX^^X X 


TGTGTTCAGA 

X ^7 A V? A X Wa^V9«« 


1380 


vj X xx^x \3^•<>v■r X X 


CTAAAATAAG 


CCTAAAAATC 


1440 


1 \^n,\3t\ X wrm^ 


AGCTGCCCTT 


ACTCGAGCTC 


1500 




TTCCGCCCCC 


CCCCTAACGT 


1560 


V9X\3X\9wV9X X X 


GTCTATATGT 


TATTTTCCAC 


1620 


rpGGAAArrT 

^w w>3nrw\w^ X 


GGCCCTGTCT 


TCTTGACGAG 


1680 


AGGAATGPAA 


GGTCTGTTGA ATGTCGTGAA 


1740 

lA ' ^ 


ACAAACAArG 


TCTGTAGCGA 


CCCTTTGCAG 


1800 


CCTCTGCGGC 


CAAAAGCCAC 


GTGTATAAGA 


1860 


CCACGTTGTG 


AGTTGGATAG 


TTGTGGAAAG 


1920 


CAAGGGGCTG 


AAGGATGCCC 


AGAAGGTACC 


1980 


GTGCACATGC 


TTTACATGTG 


TTTAGTCGAG 


2040 


CGGGGACGT6 


GTTTTCCTTT 


GAAAAACACG 


2100 


AAAGGAGAAG AACTTTTCAC AGGAGTTGTC 


2160 


AATGGGCACA AATTTTCTGT 


CAGTGGAGAG 


2220 
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Ai bLJiALArA 


CGGAAAACTT 


ACCCTTAAAT 


TTATTTGCAC 


TACTGGAAAA 


o o o n 


CTACCTGTTC 


CATGGCCAAC 


ACTTGTCACT 


ACTTTCACTT 


ATGGTGTTCA ATGCTTTTCA 


^ J4U 


AGATACCCAG 


ATCATATGAA 


ACGGCATGAC 


TTTTTCAAGA 


GTGCCATGCC 


CGAAGGTTAT 




GTACAGGAAA 


GAACTATATT 


TTTCAAAGAT 


GACGGGAACT 


ACAAGACACG 


TGCTGAAGTC 




AAGTTTGAAG 


GTGATACCCT 


TGT7AATAGA ATCGAGTTAA 


AAGGTATTGA 


TTTTAAAGAA 


o con 


GATGGAAACA 


TTCTTGGACA 


CAAATTGGAA 


TACAACTATA 


ACTCACACAA 


TGTATACATC 




ATGGCAGACA AACAAAAGAA 


TGGAACCAAA 


GTTAACTTCA 


AAATTAGACA 


CAACATTGAA 


O CA A 


GATGGAAGCG 


TTCAACTAGC 


AGACCATTAT 


CAACAAAATA 


CTCCAATTGG 


CGATGGCCCT 


Z. I uu 


GTCCTTTTAC 


CAGACAACCA 


TTACCTGTCC 


ACACAATCTG 


CCCTTTCGAA AGATCCCAAC 


c / ou 


GAAAAGAGAG 


ACCACATGGT 


CCTTCTTGAG 


TTTGTAACAG 


CTGCTGGGAT 


TACACATGGC 




AT66ATGAAC 


TATACAAGTC 


CG6ATCTAGA 


TAACTGTATC 


GATGGATCCG 


AAGGCGGGGA 




CAGCAGTGCA 


GTGGTGGACA 


GAA^^GCAAGT 


GATCTAGGCC 


AGCAGCCTCC 


CTAAAGGGAC 




TTCAGCCCAC 


AAAGCCAAAC 


TTGTGGCTTT 


AATACAAGCT 


CTGTAAATGG 


TAAAAAAAAA 


3000 


AAAGTCTACA 


CGGACAGCAG 


GTATGCTCTT 


GCCACTGTAC 


AGAGCAATAT 


ACAGACAAAG 


"3 A ^ A 


AGAACTGTTG 


ACATCTGCAG 


AGAAAGACCT 


AAGATGCTGT 


GGCTAAAAGA 


AATCAGATGG 


^ 1 O A 

3120 


CAAATCTAAC 


CGCCCAGGCA 


TCCTAAAGAG 


CAATGATCCT 


GACAGTCTGA 


AGACTATCAA 


'ai OA 
JloU 


GTTATAGACA AATTAAGACT 


GGTAAAAAA?^ ACCCTGTATA AAATAGTAAA 


AACTGAAAAA 


3240 


AGAAAACTAG 


TCCTCTCATG 


AGAAGACAGA 


CCTGACATCT 


ACTGAAAAAT 


AGACTTTACT 


3300 


GGAAAAAATA 


TGTGTATGAA 


TACCTTCTAG 


TTTTTGTGAA CGTTCTCAAG 


ATGGATAAAA 


3360 


GCTTTTCCTT 


GTAAAACGAG 


ACTGATCA6A 


TA6TCATCAA 


GAAGATTGTT 


AAAGAAAATT 


3420 


TTCCAAGGTT 


CGGAGTGCCA 


AAAGCAATAG 


TGTCAGATAA 


TGGTCCTGCC 


TTTGTTGCCC 


3480 


AGGTAAGTCA 
GACCTCAGAG 


GGGTGTGGCC 
CTCAGGAAAG 


AAGTATTTAG AGGTCAAATG 
ATAAAAAAGA ATAAATAAAA 


AAAATTCCAT 
CTCTAAACAG 


TGTGTGTACA 
ACCTTGACAA 


3540 

OCA A 

3oUu 


AATTAATCCT 


AGAGACTGGC 


ACAGACTTAC 


TTGGTACTCC 


TTCCCCTTGC 


CCTATTTAGA 


"3 f f A 

3660 


ACTGAGAATA 


CTCCCTCTTG 


ATTC6GTTTT 


ACTCTTTTTA 


AGATCCTTTA 


TGGGGCTCCT 


3720 


ATGCCATCAC 


TGTCTTAAAT 


GATGTGTTTA AACCTATGTT 


GTTATAATAA 


TGATCTATAT 


O A 

37o0 


GTTAAGTTAA AAGGCTTGCA 


GGTGGTGCAG 


AAAGAAGTCT 


GGTCACAACT 


GGCTACAGTG 


3840 


AACAAGCTGG 


GTACCCCAAG 


GACATCTTAC 


CAGTTCCAGC 


CAGAGATCTG 


ATCTACGATC 


3900 


CCCGGGTCGA 


CCCGGGTCGA 


CCCTGTGGAA 


TGTGTGTCAG 


TTAGGGTGTG 


GAAAGTCCCC 


3960 


AGGCTCCCCA 


GCAGGCAGAA GTATGCAAAG 


CATGCATCTC 


AATTA6TCAG 


CAACCAGGTG 


4020 


TGGAAAGTCC 


CCAGGCTCCC 


CAGCAGGCAG AAGTATGCAA AGCAT6CATC 


TCAATTAGTC 


4080 


AGCAACCATA GTCCCGCCCC 


TAACTCCGCC 


CATCCCGCCC 


CTAACTCCGC 


CCAGTTCCGC 


4140 


CCATTCTCCG 


CCCCATGGCT 


GACTAATTTT 


TTTTATTTAT 


GCAGAGGCCG 


AGGCCGCCTC 


4200 


GGCCTCTGAG 


CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 


GGAGGCCTAG 


GCTTTTGCAA 


4260 
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AAAGCTTCAC 


GCTGCCGCAA 


GCACTCAGGG 


CGCAAGGGCT 


GCTT^GGAA 


GCGGAACACG 


4320 


TA6AAA6CCA 


GTCCGCAGAA ACGGTGCTGA 


CCCCGGATGA 


ATGTCAGCTA 


CTGGGCTATC 


4380 


T6GACAAGGG 


AAAACGCAAG 


CGCAAAGAGA AAGCAGGTAG 


CTTGCAGTGG 


GCTTACATGG 


4440 


CGATAGCTAG 


ACTGGGCGGT 


TTTATGGACA GCAAGCGAAC 


CGGAATTGCC 


AGCTGGGGCG 


4500 


CCCTCTGGTA 


AGGTTGGGAA 


GCCCTGCAAA 


GTAAACTGGA 


TGGCTTTCTT 


GCCGCCAAGG 


4560 


ATCTGATGGC 


GCAGGGGATC 


AAGATCTGAT 


CAAGAGACAG 


GATGAGGATC 


GTTTCGCATG 


4620 


ATTGAACAAG 


ATGGATTGCA 


CGCAGGTTCT 


CCGGCCGCTT 


GGGTGGAGAG 


GCTATTCGGC 


4680 


TATGACTGGG 


CACAACAGAC 


AATCGGCTGC 


TCTGATGCCG 


CCGTGTTCCG 


GCTGTCAGCG 


4740 


CAGGG6CGCC 


CGGTTCTTTT 


TGTCAAGACC 


GACCTGTCCG 


GTGCCCTGAA 


TGAACTGCAG 


4800 


GACGAGGCAG 


CGCGGCTATC 


GTGGCTGGCC 


ACGACGGGCG 


TTCCTTGCGC 


AGCTGTGCTC 


4860 


GACGTTGTCA 


CTGAAGCGGG 


AAGGGACTGG 


CTGCTATTGG 


GCGAAGTGCC 


GGGGCAGGAT 


4920 


CTCCTGTCAT 


CTCACCTTGC 


TCCTGCCGAG 


AAAGTATCCA 


TCATGGCTGA 


TGCAATGCGG 


4980 


CGGCTGCATA 


CGCTTGATCC 


GGCTACCTGC 


CCATTCGACC 


ACCAAGCGAA 


ACATCGCATC 


5040 


GAGCGAGCAC 


GTACTCGGAT 


GGAAGCCGGT 


CTTGTCGATC 


AGGATGATCT 


GGACGAAGAG 


5100 


CATCAG6GGC 


TCGCGCCAGC 


CGAACTGTTC 


GCCAGGCTCA 


AGGCGCGCAT 


GCCCGACGGC 


5160 


GAGGATCTCG 


TCGTGACCCA 


TGGCGATGCC 


TGCTTGCCGA 


ATATCATGGT 


GGAAAATGGC 


coon 

5220 


CGCTTTTCTG 


GATTCATCGA 


CTGTGGCCGG 


CTGGGT6TGG 


CGGACCGCTA 


TCAGGACATA 


e o A A 

5280 


GCGTTGGCTA 


CCCGTGATAT 


TGCTGAAGAG 


CTTGGCGGCG 


AATGGGCTGA 


CCGCTTCCTC 


5340 


fit ^^^^m fit in 9k ^^^^ 

GTGCTTTACG 


GTATCGCCGC 


TCCCGATTCG 


CA6CGCATCG 


CCTTCTATCG 


CCTTCTTGAC 


5400 


GAGTTCTTCT 


GAGCGGGACT 


CTGGGGTTCG 


AAATGACCGA 


^^^^^ It IV 

CCAAGCGACG 


CCCAACCTGC 


54dU 


CATCACGAGA 
TCCGGGACGG 


TTTCGATTCC 
AATTCGTAAT 


ACCGCCGCCT 
CTGCTGCTTG 


TCTATGAAAG 
CAAACAAAAA 


GTTGGGCTTC 
AACCACCGCT 


GGAATCGTTT 
ACCAGCGGTG 


c c o r\ 

5580 


GTTTGTTT6C 


CGGATCAAGA 


GCTACCAACT 


CTTTTTCCGA 


AGGTAACTGG 


CTTCAGCAGA 


5640 


GCGCAGATAC 


CAAATACTGT 


CCTTCTAGTG 


TAGCCGTAGT 


TAGGCCACCA 


CTTCAAGAAC 


5700 


TCTGTAGCAC 


CGCCTACATA 


CCTCGCTCTG 


CTAATCCTGT 


TACCAGTGGC 


TGCTGCCAGT 


5760 


GGCGATAAGT 


CGTGTCTTAC 


CGGGTTGGAC 


TCAAGACGAT 


AGTTACCGGA 


TAAGGCGCAG 


5820 


CGGTCGGGCT 


GAACGGGGGG 


TTCGTGCACA 


CAGCCCAGCT 


TGGAGCGAAC 


GACCTACACC 


5880 


GAACTGAGAT 


ACCTACAGCG 


TGAGCATTGA 


GAAAGCGCCA 


CGCTTCCCGA 


AGGGAGAAAG 


5940 


GCGGACAGGT 


ATCCGGTAAG 


CGGCAGGGTC 


GGAACAGGAG 


AGCGCACGAG 


GGAGCTTCCA 


6000 


GG6GGAAACG 


CCTGGTATCT 


TTATAGTCCT 


GTCGGGTTTC 


GCCACCTCTG 


ACTTGAGCGT 


6060 


CGATTTTTGT 


GATGCTCGTC 


AGGGGGGCGG 


AGCCTATGGA 


AAAACGCCAG 


CAACGCCGAG 


6120 


ATGCGCCGCC 


TCGAGTACAC 


CTGC6TCATG 


CTGAGACCCT 


CAAGCCTCAC 


TAAAAGGGTC 


6180 


CCTGCCTAGT 


TCTGTTTACT 


AATCTGCCTT 


ATTCTGTTTT 


TGTTCCCATG 


TTAAAGATAG 


6240 
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AGTAAATGCA GTATTCTCCA CATAGAGATA TAGACTTCTG AAATTCTAAG ATTAGAATTA 63a0 
TTTACAAGAA GAAGTGGGGA A 6321 
S (2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5754 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS: single 

' (D) TOPOLOGY: linear 



15 



25 



35 



45 



55 



65 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 



TGAAGAATAA 


AAAATTACTG 


GCCTCTTGTG 


AGAACATGAA 


CTTTCACCTC 


GGAGCCCACC 


60 


CCCTCCCATC 


TGGAAAACAT 


ACTTGAGAAA 


AACATTTTCT 


GGAACAACCA CAGAATGTTT 


120 


CAACAGGCCA 


GATGTATTGC 


CAAACACAGG 


ATATGACTCT 


TTGGTTGAGT 


AAATTTGTGG 


180 


TTGTTAAACT 


TCCCCTATTC 


CCTCCCCATT 


CCCCCTCCCA 


GTTTGTGGTT 


TTTTCCTTTA 


240 


AAAGCTTGTG 


AAAAATTTGA 


GTCGTCGTCG 


AGACTCCTCT 


ACCCTGTGCA AAGGTGTATG 


300 


AGTTTCGACC 


CCA6AGCTCT 


GTGTGCTTTC 


TGTTGCTGCT 


TTATTTCGAC 


CCCAGAGGTC 


360 


TGGTCTGTGT 


GCTTTCATGT 


CGCTGCTTTA 


TTAAATCTTA 


CCTTCTACAT 


TTTATGTATG 


420 


GTCTCAGTGT 


CTTCTTGGGT 


ACGCGGCTGT 


CCCGGGACTT 


GAGTGTCTGA GTGAGGGTCT 


480 


TCCCTCGAGG 


GTCTTTCATT 


TGGTACATGG 


GCCGGGAATT 


CGAGAATCTT 


TCATTTGGTG 


540 


CATTGGCCGG 


GAATTCGAAA 


ATCTTTCATT 


TGGTGCATTG 


GCCGGGAAAC 


AGCGCGACCA 


600 


CCCAGAGGTC 


CTAGACCCAC 


TTAGAGGTAA 


GATTCTTTGT 


TCTGTTTTGG 


TCTGATGTCT 


660 


GTGTTCTGAT 


GTCTGTGTTC 


TGTTTCTAAG 


TCTGGTGCGA 


TCGCAGTTTC 


AGTTTTGCGG 


720 


ACGCTCAGTG 


AGACCGCGCT 


CCGAGA66GA 


GTGCGGGGTG 


GATAAGGATA 


GACGTGTCCA 


780 


GGTGTCCACC 


GTCCGTTCGC 


CCTGGGAGAC 


GTCCCAGGAG 


GAACAGGGGA 


GGATCAGGGA 


840 


CGCCTGGTGG 


ACCCCTTTGA 


AGGCCAAGAG 


ACCATTTGGG 


GTTGCGAGAT 


CGTGGGTTCG 


900 


AGTCCCACCT 


CGTGCCCAGT 


TGCGAGATCG 


TGGGTTCGAG 


TCCCACCTCG 


TGTTTTGTTG 


960 


CGAGATCGTG 


GGTTCGAGTC 


CCACCTCGCG 


TCTGGTCACG 


GGATCGTGGG 


TTCGAGTCCC 


1020 


ACCTCGTGTT 


TTGTTGCGAG 


ATCGTGGGTT 


CGAGTCCCAC 


CTCGCGTCTG 


GTCACGGGAT 


1080 


CGTGGGTTCG 


AGTCCCACCT 


CGTGCAGAGG 


GTCTCAATTG 


GCCGGCCTTA GAGAGGCCAT 


1140 


CTGATTCTTC 


TGGTTTCTCT 


TTTTGTCTTA 


GTCTCGTGTC 


CGCTCTTGTT 


GTGACTACTG 


1200 


TTTTTCTAAA 


AAT6GGACAA 


TCTGTGTCCA 


CTCCCCTTTC 


TCTGACTCTG 


GTTCTGTCGC 


1260 


TT6GTAATTT 


TGTTTGTTTA 


CGTTTGTTTT 


TGTGAGTCGT 


CTATGTTGTC 


TGTTACTATC 


1320 


TTGTTTTTGT 


TTGTGGTTTA 


CGGTTTCTGT 


GTGTGTCTTG 


TGTGTCTCTT 


TGTGTTCAGA 


1380 


CTTGGACTGA 


TGACTGACGA 


CTGTTTTTAA 


GTTATGCCTT 


CTAAAATAAG 


CCTAAAAATC 


1440 
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CTGTCAGATC CCTATGCTGA CCACTTCCTT 
AAGCTTCGAA TTCTGCAGTC GACGGTACCG 

5 

GGTAAAGGAG AAGAACTTTT CACAGGAGTT 
GTTAATGGGC ACAAATTTTC TGTCAGTGGA 
10 CTTACCCTTA AATTTATTTG CACTACTGGA 
ACTACTTTCA CTTATGGTGT TCAATGCTTT 
GACTTTTTCA AGAGTGCCAT GCCCGAAGGT 

15 

GATGACGGGA ACTACAAGAC ACGTGCTGAA 
AGAATCGAGT TAAAAGGTAT TGATTTTAAA 
20 GAATACAACT ATAACTCACA CAATGTATAC 
AAAGTTAACT TCAAAATTAG ACACAACATT 
TATCAACAAA ATACTCCAAT TGGCGATGGC 

25 

TCCACACAAT CTGCCCTTTC GAAAGATCCC 
GAGTTTGTAA CAGCTGCTGG GATTACACAT 
30 AGATAACTGT ATCGATGGAT CCGAAGGCGG 
AGTGATCTAG GCCAGCAGCC TCCCTAAAGG 
TTTAATACAA GCTCTGTAAA TGGTAAAAAA 

35 

CTTGCCACTG TACAGAGCAA TATACAGACA 

CCTAAGATGC TGTGGCTAAA AGAAATCAGA 

40 GAGCAATGAT CCTGACAGTC TGAAGACTAT 
AAAACCCTGT ATAAAATAGT AAAAACTGAA 

AGACCTGACA TCTACTGAAA AATAGACTTT 

45 TAGTTTTTGT GAACGTTCTC AAGATGGATA 

AGATA6TCAT CAAGAAGATT GTTAAAGAAA 

TAGTGTCAGA TAATGGTCCT GCCTTTGTTG 

50 

TAGAGGTCAA ATGAAAATTC CATTGTGTGT 
AGAATAAATA AAACTCTAAA CAGACCTTGA 
55 TACTTGGTAC TCCTTCCCCT TGCCCTATTT 
TTTACTCTTT TTAAGATCCT TTATGGGGCT 
TTAAACCTAT GTTGTTATAA TAATGATCTA 

60 

CAGAAAGAAG TCTGGTCACA ACTGGCTACA 
TACCA6TTCC AGCCAGAGAT CTGATCTACG 
65 GAATGTGTGT CAGTTAGGGT GTGGAAAGTC 



TCAGATCAAC 


AGCTGCCCTT 


ACTCGAGCTC 


1500 


CGGGCCCGGG 


ATCCACCGGT 


CGCCACCATG 


1560 


GTCCCAATTC 


TTGTTGAATT 


AGATGGTGAT 


1620 


GAGGGTGAAG 


GTGATGCAAC 


ATACGGAAAA 


1680 


AAACTACCTG 


TTCCATGGCC 


AACACTTGTC 


1740 


TCAAGATACC 


CAGATCATAT 


GAAACGGCAT 


1800 


TATGTACAGG 


AAAGAACTAT 


ATTTTTCAAA 


1860 


GTCAAGTTTG 


AAGGTGATAC 


CCTTGTTAAT 


1920 


GAAGATGGAA ACATTCTTGG 


ACACAAATTG 


1980 


ATCATGGCAG 


ACAAACAAAA 


GAATGGAACC 


2040 


GAAGATGGAA 


GCGTTCAACT 


AGCAGACCAT 


2100 


CCTGTCCTTT 


TACCAGACAA 


CCATTACCTG 


2160 


AACGAAAAGA 


GAGACCACAT 


6GTCCTTCTT 


2220 


GGCATGGATG 


AACTATACAA 


GTCCGGATCT 


2280 


GGACAGCAGT 


GCAGTGGTGG 


ACAGAAAGCA 




GACTTCAGCC 


CACAAAGCCA AACTTGTGGC 


24UU 


AAAAAAGTCT 


ACACGGACAG 


CAGGTATGCT 


2460 


AAGAGAACTG 


TTGACATCTG 


CAGAGAAAGA 




TGGCAAATCT 


AACCGCCCAG 


GCATCCTAAA 


O C O A 


CAAGTTATAG 
AAAAGAAAAC 


ACAAATTAAG 
TAGTCCTCTC 


ACTGGTAAAA 
ATGAGAAGAC 


2700 


ACTGGAAAAA ATATGTGTAT 


GAATACCTTC 


2760 


AAAGCTTTTC 


CTTGTAAAAC 


GAGACTGATC 


2820 


ATTTTCCAAG 


GTTCGGAGTG 


CCAAAAGCAA 


2880 


CCCAGGTAAG 


TCAGGGTGTG 


GCCAAGTATT 


2940 


ACAGACCTCA 


GAGCTCAGGA 


AAGATAAAAA 


3000 


CAAAATTAAT 


CCTAGAGACT 


GGCACAGACT 


3060 


AGAACTGAGA ATACTCCCTC 


TTGATTCGGT 


3120 


CCTATGCCAT 


CACTGTCTTA AATGAT6TGT 


3180 


TATGTTAAGT 


TAAAAGGCTT 


GCA6GTGGTG 


3240 


GTGAACAAGC 


TGGGTACCCC 


AAGGACATCT 


3300 


ATCCCCGGGT 


CGACCCGGGT 


CGACCCTGTG 


3360 


CCCAGGCTCC 


CCAGCAGGCA 


GAAGTATGCA 


3420 
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AAGCATGCAT 


CTCAATTAGT 

w X NM>fAX xnux 


CAGCAACCAG 


GTGTGGAAAG 

w X w X Ww/Wlw 


TPPPPAfifJPT 


PPPPArf^PAfifZ 




CA6AAGTATG 


CAAAGCATGC 


ATCTCAATTA 


GTCAGCAACr 
w X wfiw wnn^ w 


ATAGTPPPGP 

n X AV9 X Vrwv«ww 


PPPTA APTPP 




GCCCATCCCG 


CCCCTAACTC 

WW WW X Avr«w X w 


CGCCCAGTTC 

www Ww«^w X Xw 


CGCCCATTCT 

w w www<% X X w X 




fiPTGAPTAAT 
X V9/%\i« x/mx 




mrprpfp rp rp fn rn rp 
Ji X X A A X A 


TATGCAGAGG 


CCGAGGCCGC 

WWWsaW WW WW W 


CTCGGCCTCT 

w X wwwWw X w X 


GAGPTATTPP 


AGAAGTAGTG 




AGGAGGCTTT 


TTTGGAGGCC 


TAGGCTTTTG 


CAAAAAGCTT 


CACGCTGCCG 

w<»w WW X \3w WW 


CAAGCACTCA 

wfaiAW w«^W X wf^ 


3720 


GGGCGCAAGG 


GCTGCTAAAG 

«^ A A • AAA»A%^ 


GAAGCGGAAC 


ACGTAGAAAG 


CCAGTCCGCA 


GAAACGGTGC 


3780 


TGACCCCGGA 


TGAATGTCAG 


CTACTGGGCT 


ATCTGGACAA 


GGGAAAACGC 


AAGCGCAAAG 


3840 


AGAAAGCAGG 


TAGCTTGCAG 


TGGGCTTACA 


TGGCGATAGC 


TAGACTGGGC 


GGTTTTATGG 

A A A AAAA^hV^i^ 


3900 

w *(r *r V 


ACAGCAAGCG 


AACCGGAATT 


GCCAGCTGGG 


GCGCCCTCTG 


GTAAGGTTGG 

%J X «»«TVJ^7 X X WW 


GAAGCCCTGC 

wfV^w w^« w X WW 


3960 


AAAGTAAACT 


GGATGGCTTT 

WwnXwwwX X X 


CTTGCCGCCA 

W X X wwwwwwsk 


AGGATCTGAT 

nwwmx w X wnx 


GGCGCAGGGG 


ATCAAGATCT 

t\ X wAAwnX w X 


4020 


GATCAAGAGA 


CAGGATGAGG 


ATCGTTTCGC 

nx WW XXX w\9w 


ATGAT TGAAP 


AAGATGGATT 


GPAPGPAGGT 


40fiO 




\j X XV3V3UXV3V3A 


GAGGPTATTP 

Xt\X X \^ 


ovjv^ X rv i \Mi\^ X 


rZfi^^PAPAAPA 


AP A A TPfiCP 




i 1 i urii o 








PPPP/^PTTPT 


TTTTnTP A A 
1 X i X u X ^rVnu 








urUl X Uriri^ X u 


wxilj VjriV.* O/iu V3 




TV n»/^p fp p pq»p 
ni y^\3 1 \p\3\^ X o 




GCCAC6ACGG 


GCGTTCCTTG 


CGCAGCTGTG 


CTCGAC6TTG 


TCACTGAAGC 


GGGAAGGGAC 


4320 


TGGCTGCTAT 


TG6GCGAAGT 


GCCGGGGCAG 


GATCTCCTGT 


CATCTCACCT 


TGCTCCTGCC 


4380 


GAGAAAGTAT 


CCATCATGGC 


TGATGCAATG 


CGGCG6CTGC 


ATACGCTTGA 


TCCGGCTACC 


4440 


TGCCCATTCG 


ACCACCAAGC 


GAAACATCGC 


ATCGAGCGAG 


CACGTACTCG 


GATGGAAGCC 


4500 


GGTCTTGTCG 


ATCAGGATGA 


TCTGGACGAA 


GAGCATCAGG 


GGCTCGCGCC 


AGCCGAACTG 


4560 


TTCGCCAGGC 


TCAAGGCGCG 

\^\3nn X nX \^e\x 


CATGCCCGAC 

GGTGGAARAT 
wo X uuxvuin X 


GGCGAGGATC 

GGPPGPTTTT 


TCGTCGTGAC 

PTf^HATTPAT 


CCATGGCGAT 
PGAPTGTGGP 


4 620 
4 680 

t V O V 


rGCCTGGCTC 


TGGPGGACCG 

X W wwwi^www 


CTATCAGGAG 

W X Fh X Vm»w waAW 


ATAGCGTTGG 

f%XnwwwX X\3w 


C T APPrCTG A 

w Xltwwww X wT* 


TATTGCTGAA 

X n X X WW X vxrm 


4740 


GAGCTTGGPn 

19 X X V9\9^Va 


GPGAATCGGC 
wwunn X uoow 


TGACCGCTTP 

X w*»w w w w X X\^ 


w X w VI X I9w XXX 


APfiGTATPGP 

n.VaI W W X <1X WVJI W 


CGCTCCCGAT 

Www X w w ww<VX 


4800 




X\^O^V^X XWXA 


X Ww X X W X X 


Vsn^Vanu X X w X 


TPTGAGPCGG 


APTCTCCGGT 

X V X V7www X 


4860 


X Vi»V9 Ann 1 S3n\0 






TGCCATCACG 


AGATTTCGAT 


X V« wAwwU WwV9 


4 Q20 


^Vol IvrxrllVan 


nnuva x X 


X X i^vava/irix wi3 


TTTTCCGGGA 


CGGAATTCGT 


AATPTGPTGP 
r\r\X w X Ow X ov^ 


1 70 w 


T T^2P a li li 2V 2i 
X X u^AA/iwVri 




CPTAPPAfiPP 


GTGGTTTGTT 


TGCCGGATCA 


ACAfiPTAPPA 




ACTCT iTTTC 






AGAGCGCAGA 


TACCAAATAC 


ioiuUl id A 


D1\J\J 


GTGTAGCCGT 


AGTTAGGCCA 


CCACTTCAAG 


AACTCTGTAG 


CACCGCCTAC 


ATACCTCGCT 


5160 


CTGCTAATCC 


TGTTACCAGT 


GGCTGCTGCC 


AGTGGCGATA AGTCGTGTCT 


TACCGGGTTG 


5220 


GACTCAAGAC 


GATAGTTACC 


GGATAA6GCG 


CAGCGGTCGG 


GCTGAACGGG 


GGGTTCGTGC 


5280 


ACACAGCCCA 


GCTTGGAGCG 


AACGACCTAC 


ACCGAACTGA 


GATACCTACA 


GCGTGA6CAT 


5340 


TGAGAAAGCG 


CCACGCTTCC 


CGAAGGGAGA 


AAGGCGGACA 


GGTATCCGGT 


AAGCGGCAGG 


5400 


GTCGGAACAG 


GAGAGCGCAC 


GAGGGAGCTT 


CCAGGGGGAA 


ACGCqiGGTA 


TCTTTATAGT 


5460 
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CCTGTCGGGT TTCGCCACCT CTGACTTGAG CGTCGATTTT TGTGATGCTC GTCAGGGGGG 5520 



CGGAGCCTAT GGAAAAACGC CAGCAACGCC GA6ATGCGCC GCCTCGAGTA CACCTGCGTC 5580 

5 

ATGCTGAGAC CCTCAAGCCT CACTAAAAGG GTCCCTGCCT AGTTCTGTTT ACTAATCTGC 5640 



CTTATTCTGT TTTTGTTCCC ATGTTAAAGA TAGAGTAAAT GCAGTATTCT CCACATAGAG 5700 
10 ATATAGACTT CTGAAATTCT AAGATTAGAA TTATTTACAA GAAGAAGTGG GGAA 5754 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 5754 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

20 (ii) MOLECULE TYPE: DNA (genomic) 



25 |xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

TGAAGAATAA AAAATTACTG GCCTCTTGTG AGAACATGAA CTTTCACCTC GGAGCCCACC 60 



CCCTCCCATC TGGAAAACAT ACTTGAGAAA AACATTTTCT GGAACAACCA CAGAATGTTT 120 

30 

CAACAGGCCA GATGTATTGC CAAACACAGG ATATGACTCT TTGGTTGAGT AAATTTGTGG 180 

TTGTTAAACT TCCCCTATTC CCTCCCCATT CCCCCTCCCA GTTTGTGGTT TTTTCCTTTA 240 

35 AAAGCTTGTG AAAAATTTGA GTCGTCGTCG AGACTCCTCT ACCCTGTGCA AAGGTGTATG 300 

AGTTTCGACC CCAGAGCTCT GTGTGCTTTC TGTTGCTGCT TTATTTCGAC CCCAGAGCTC 360 

TGGTCTGTGT GCTTTCATGT CGCTGCTTTA TTAAATCTTA CCTTCTACAT TTTATGTATG 420 

40 

GTCTCAGTGT CTTCTTGGGT ACGCGGCTGT CCCGGGACTT GAGTGTCTGA GTGAGGGTCT 480 

TCCCTCGAGG GTCTTTCATT TGGTACATGG GCCGGGAATT CGAGAATCTT TCATTTGGTG 540 

45 CATTGGCCGG GAATTCGAAA ATCTTTCATT TGGTGCATTG GCCGGGAAAC AGCGCGACCA 600 

CCCAGAGGTC CTAGACCCAC TTAGAGGTAA GATTCTTTGT TCTGTTTTGG TCTGATGTCT 660 

GTGTTCTGAT GTCTGTGTTC TGTTTCTAAG TCTGGTGCGA TCGCAGTTTC AGTTTTGCGG 720 

50 

ACGCTCAGTG AGACCGCGCT CCGAGAGGGA GTGCGGGGTG GATAAGGATA GACGTGTCCA 780 

GGTGTCCACC GTCCGTTCGC CCTGGGAGAC GTCCCAGGAG GAACAGGGGA GGATCAGGGA 840 

55 CGCCTGGTGG ACCCCTTTGA AGGCCAAGAG ACCATTTGGG GTTGCGAGAT CGTGGGTTCG 900 

AGTCCCACCT CGTGCCCAGT TGCGAGATCG TGGGTTCGAG TCCCACCTCG TGTTTTGTTG 960 

CGAGATCGTG GGTTCGAGTC CCACCTCGCG TCTGGTCAC6 GGATCGTGGG TTCGAGTCCC 1020 

60 

ACCTCGTGTT TTGTTGCGAG ATCGTGGGTT CGAGTCCCAC CTCGCGTCTG GTCACGGGAT 1080 

CGTGGGTTCG AGTCCCACCT CGTGCAGAGG GTCTCAATTG GCCGGCCTTA GAGAGGCCAT 1140 

65 CTGATTCTTC TGGTTTCTCT TTTTGTCTTA GTCTCGTGTC CGCTCTTGTT GTGACTACTG 1200 
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TTTTTCTAAA 


AATGGGACAA TCTGTGTCCA CTCCCCTTTC 


TCTGACTCTG 


GTTCTGTCGC 


1260 


TTGGTAATTT 


TGTTTGTTTA 


CGTTTGTTTT TGTGAGTCGT 


CTATGTTGTC 


TGTTACTATC 


1320 


TTGTTTTTGT 


TTGTGGTTTA 


CGGTTTCTGT GTGTGTCTTG 


TGTGTCTCTT 


TGTGTTCAGA 


1380 


CTTGGACTGA 


TGACTGACGA 


CTGTTTTTAA GTTATGCCTT 


CTAAAATAAG 


CCTAAAAATC 


1440 


CTGTCAGATC 


CCTATGCTGA 


CCACTTCCTT TCAGATCAAC 


AGCTGCCCTT 


ACTCGAGCTC 


1500 


AAGCTTCGAA 


TTCTGCAGTC 


GACGGTACCG CGGGCCCGGG 


ATCCACCGGT 


CGCCACCATG 


1560 


GGTAAAGGAG 


AAGAACTTTT 


CACTGGAGTT GTCCCAATTC 


TTGTTGAATT 


AGATGGTGAT 


1620 


GTTAATGGGC 


ACAAATTTTC 


TGTCAGTGGA GAGGGTGAAG 


GTGATGCAAC. 


ATACGGAAAA 


1680 


CTTACCCTTA 


AATTTATTTG 


CACTACTGGA AAACTACCTG 


TTCCATGGCC 


AACACTTGTC 


1740 


ACTACTTTCT 


CTTATGGTGT 


TCAATGCTTT TCAAGATACC 


CAGATCATAT 


GAAACGGCAT 


1800 


GACTTTTTCA 


AGAGTGCCAT 


GCCCGAAGGT TATGTACAGG 


AAAGAACTAT 


ATTTTTCAAA 


1860 


GATGACGGGA 


ACTACAAGAC 


ACGTGCTGAA GTCAAGTTTG 


AAGGTGATAC 


CCTTGTTAAT 


1920 


AGAATCGAGT 


TAAAAGGTAT 


TGATTTTAAA GAAGATGGAA 


ACATTCTTGG 


ACACAAATTG 


1980 


GAATACAACT 


ATAACTCACA 


CAATGTATAC ATCATGGCAG 


ACAAACAAAA 


6AATGGAACC 


2040 


AAAGTTAACT 


TCAAAATTAG ACACAACATT GAAGATGGAA 


GCGTTCAACT 


AGCAGACCAT 


2100 


TATC7VACAAA 


ATACTCCAAT 


TGGCGATGGC CCTGTCCTTT 


TACCAGACAA 


CCATTACCTG 


2160 


TCCACACAAT 


CTGCCCTTTC 


GAAAGATCCC AACGAAAAGA 


GAGACCACAT 


GGTCCTTCTT 


2220 


GAGTTTGTAA 


CAGCTGCTGG 


GATTACACAT GGCATGGATG 


AACTATACAA 


GTCCGGATCT 


2280 


AGATAACTGT 
AGTGATCTAG 


ATCGATGGAT 
GCCAGCAGCC 


CCGAAGGCGG GGACAGCAGT 
TCCCTAAAGG GACTTCAGCC 


GCAGTGGTGG 
CACAAAGCCA 


ACAGAAAGCA 
AACTTGTGGC 


2340 
2400 


TTTAATACAA 


6CTCT6TAAA TGGTAAAAAA AAAAAAGTCT 


ACACGGACAG 


CAGGTATGCT 


2460 


CTTGCCACTG 


TACAGAGCAA TATACAGACA AAGAGAACTG 


TTGACATCTG 


CAGAGAAAGA 


2520 


CCTAAGATGC 


TGTGGCTAAA AGAAATCAGA TGGCAAATCT 


AACCGCCCAG 


GCATCCTAAA 


2580 


GAGCAATGAT 


CCTGACAGTC 


TGAAGACTAT CAAGTTATAG 


ACAAATTAAG 


ACTGGTAAAA 


2640 


AAAACCCT6T 


ATAAAATAGT 


AAAAACTGAA AAAAGAAAAC 


TAGTCCTCTC 


ATGAGAAGAC 


2700 


AGACCTGACA 


TCTACTGAAA AATAGACTTT ACTGGAAAAA ATATGTGTAT 


GAATACCTTC 


2760 


TAGTTTTTGT 


GAACGTTCTC 


AAGATGGATA AAAGCTTTTC 


CTTGTAAAAC 


GAGACTGATC 


2820 


AGATAGTCAT 


CAAGAAGATT 


GTTAAAGAAA ATTTTCCAAG 


GTTCGGAGTG 


CCAAAAGCAA 


2880 

M W W \0 


TAGTGTCAGA 


TAATGGTCCT GCCTTTGTTG CCCAGGTAAG 


TCAGGGTGTG 


GCCAAGTATT 


2940 


TAGAG6TCAA 


ATGAAAATTC 


CATTGTGTGT ACAGACCTCA GAGCTCAGGA 


AAGATAAAAA 


3000 


AGAATAAATA 


AAACTCTAAA 


CAGACCTTGA CAAAATTAAT 


CCTAGAGACT 


GGCACAGACT 


3060 


TACTTGGTAC 


TCCTTCCCCT 


TGCCCTATTT AGAACTGAGA ATACTCCCTC 


TTGATTCGGT 


3120 


TTTACTCTTT 


TTAAGATCCT 


TTATGGGGCT CCTATGCCAT 


CACTGTCTTA 


AATGATGTGT 


3180 


TTAAACCTAT 


GTTGTTATAA 


TAATGATCTA TATGTTAAGT 


TAAAAGGCTT 


GCAGGTGGTG 


3240 
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CAGAAAGAAG TCTGGTCACA ACTGGCTACA 



TACCAGTTCC AGCCAGAGAT CTGATCTACG 

5 

GAATGTGTGT CAGTTAGGGT GTGGAAAGTC 



AAGCATGCAT CTCAATTAGT CAGCAACCAG 
10 CAGAAGTATG CAAAGCATGC ATCTCAATTA 



GCCCATCCCG CCCCTAACTC CGCCCAGTTC 



TTTTTTTATT TATGCAGAGG CCGAGGCCGC 

15 

AGGAGGCTTT TTTGGAGGCC TAG6CTTTTG 



GGGCGCAAGG GCTGCTAAAG GAAGCGGAAC 
20 TGACCCCGGA TGAATGTCAG CTACTGGGCT 



AGAAAGCAGG TAGCTTGCAG TGGGCTTACA 



ACAGCAAGCG AACCGGAATT GCCAGCTGGG 

25 

AAAGTAAACT GGATGGCTTT CTTGCCGCCA 



GATCAAGAGA CAGGATGAGG ATCGTTTCGC 
30 TCTCCGGCCG CTTGGGTGGA GAGGCTATTC 



TGCTCTGATG CCGCCGTGTT CCGGCTGTCA 



ACCGACCTGT CCGGTGCCCT GAATGAACTG 

35 

GCCACGACGG GCGTTCCTTG CGCAGCTGTG 
TGGCTGCTAT XGGGCGAAGT GCCGGGGCAG 

GAGAAAGTAT CCATCATGGC TGATGCAATG 

40 

TGCCCATTCG ACCACCAAGC GAAACATCGC 



GGTCTTGTCG ATCAGGATGA TCTGGACGAA 
45 TTCGCCAGGC TCAAGGCGCG CATGCCCGAC 



GCCTGCTTGC CGAATATCAT GGTGGAAAAT 

CGGCTGGGTG TGGCGGACCG CTATCAGGAC 

50 

GAGCTTGGCG GCGAATGGGC TGACCGCTTC 



TCGCAGCGCA TCGCCTTCTA TCGCCTTCTT 
55 TCGAAATGAC CGACCAAGCG ACGCCCAACC 



CCTTCTATGA AAGGTTGGGC TTCG6AATCG 



TTGCAAACAA AAAAACCACC GCTACCAGCG 

60 

ACTCTTTTTC CGAAGGTAAC TGGCTTCAGC 



GTGTAGCCGT AGTTAGGCCA CCACTTCAAG 
65 CTGCTAATCC TGTTACCAGT GGCTGCTGCC 



GTGAACAAGC 


TGGGTACCCC 


AAGGACATCT 


3300 


ATCCCCGGGT 


C6ACCCGGGT 


CGACCCTGTG 


3360 


CCCAGGCTCC 


CCAGCAGGCA 


GAAGTATGCA 


3420 


GTGTGGAAAG 


TCCCCAGGCT 


CCCCAGCAGG 


3480 


GTCAGCAACC 


ATAGTCCCGC 


CCCTAACTCC 


3540 


CGCCCATTCT 


CCGCCCCATG 


GCTGACTAAT 


3600 


CTCGGCCTCT 


GAGCTATTCC 


AGAAGTAGTG 


3660 


CAAAAAGCTT 


CACGCTGCCG 


CAAGCACTCA 


3720 


ACGTAGAAAG 


CCAGTCCGCA 


GAAACGGTGC 


3780 


ATCTGGACAA 


GGGAAAACGC 


AAGCGCAAAG 


3840 


TGGCGATAGC 


TAGACTGGGC 


GGTTTTATGG 


3900 


GCGCCCTCTG 


GTAAGGTTGG 


GAAGCCCTGC 


3960 


tv3\DJ\i \» i. orll 




2V T r* 21 a C 2i T P T 


AC\OC\ 








H UOU 














TTTTfiTPHan 
1 X X X 0 1 ^/inu 








A X X VaVJw X Vj 


AO fid 


GATCTCCTGT 


CATCTCACCT 


TGCTCCTGCC 


A^7Ci 

4380 


CGGCGGCTGC 


ATACGCTTGA 


TCCGGCTACC 


4440 


ATCGAGCGAG 


CACGTACTCG 


GATGGAAGCC 


4500 


GAGCATCAGG 


GGCTCGCGCC 


AGCCGAACTG 


4560 


GGCGAGGATC 


TCGTCGTGAC 


CCATG6CGAT 


4620 


GGCCGCTTTT 


CTGGATTCAT 


CGACTGTGGC 


4680 


ATAGCGTTGG 


CTACCCGTGA 


TATTGCTGAA 


4740 


CTCGTGCTTT 


ACGGTATCGC 


CGCTCCCGAT 


4800 


GACGAGTTCT 


TCTGAGCGGG 


ACTCTGGGGT 


4860 


TGCCATCACG 


AGATTTCGAT 


TCCACCGCCG 


4920 


TTTTCCGGGA 


CGGAATTCGT 


AATCTGCTGC 


4980 


GTGGTTTGTT 


TGCCGGATCA 


AGAGCTACCA 


5040 


AGAGCGCAGA 


TACCAAATAC 


TGTCCTTCTA 


5100 


AACTCTGTAG 


CACCGCCTAC 


ATACCTCGCT 


5160 


AGTGGCGATA AGTCGTGTCT 


TACCGGGTTG 


5220 
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GACTCAAGAC 


GATAGTTACC 


GGATAAGGCG 


CAGCGGTCGG 


GCTGAACGGG 


GGGTTCGTGC 


528^ 


ACACAGCCCA 


GCTTGGAGCG 


AACGACCTAC 


ACCGAACTGA GATACCTACA 


GCGTGAGCAT 


5340 


TGAGAAAGCG 


CCACGCTTCC 


CGAAGGGAGA AAGGCGGACA 


GGTATCCGGT 


AAGCGGCAGG 


5400 


GTCGGAACAG 


GAGAGCGCAC 


GAGGGAGCTT 


CCAGGGGGAA ACGCCTGGTA 


TCTTTATAGT 


54 60 


CCTGTCGGGT 


TTCGCCACCT 


CTGACTTGAG 


CGTCGATTTT 


TGTGATGCTC 


GTCAGGGGGG 


5520 


CGGAGCCTAT 


GGAAAAACGC 


CAGCAACGCC 


GAGATGCGCC 


GCCTCGAGTA 


CACCTGCGTC 


5580 


ATGCT6AGAC 


CCTCAAGCCT 


CACTAAAAGG 


GTCCCTGCCT 


AGTTCTGTTT 


ACTAATCTGC 


5640 


CTTATTCTGT 


TTTTGTTCCC 


ATGTTAAAGA TAGAGTAAAT 


GCAGTATTCT 


CCACATAGAG 


5700 


ATATAGACTT 


CTGAAATTCT 


AAGATTAGAA 


TTATTTACAA 


GAAGAAGTGG 


GGAA 


5754 



(2) INFORMATION FOR S£Q ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4958 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

. (ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:20: 



AGGCGGGGAC 


AGCAGTGCAG 


TGGTGGACAG 


AAAGCAAGTG 


ATCTAGGCCA 


GCAGCCTCCC 


60 


TAAAGGGACT 


TCAGCCCACA 


AAGCCAAACT 


TGTGGCTTTA 


ATACAAGCTC 


TGTAAATGGT 


120 


AAAAAAAAAA 


AAGTCTACAC 


GGACAGCAGG 


TATGCTCTTG 


CCACTGTACA 


GAGCAATATA 


180 


CAGACAAAGA 


GAACTGTTGA 


CATCTGCAGA 


GAAAGACCTA 


AGATGCTGTG 


GCTAAAAGAA 


240 


ATCAGATGGC 


AAATCTAACC 


GCCCAGGCAT 


CCTAAAGAGC 


AATGATCCTG 


ACAGTCTGAA 


300 


GACTATCAAG 


TTATAGACAA 


ATTAAGACTG 


GTAAAAAAAA 


CCCTGTATAA 


AATAGTAAAA 


360 


ACTGAAAAAA 


GAAAACTAGT 


CCTCTCATGA 


GAAGACAGAC 


CTGACATCTA 


CTGAAAAATA 


420 


GACTTTACTG 


GAAAAAATAT 


GTGTATGAAT 


ACCTTCTAGT 


TTTTGTGAAC 


GTTCTCAAGA 


480 


TGGATA/^G 


CTTTTCCTTG 


TAAAACGAGA 


CTGATCAGAT 


AGTCATCAAG 


AAGATTGTTA 


540 


AAGAAAATTT 


TCCAAGGTTC 


GGAGTGCCAA 


AAGCAATAGT 


GTCAGATAAT 


GGTCCTGCCT 


600 


TTGTTGCCCA 


GGTAAGTCAG 


GGTGTGGCCA 


AGTATTTAGA 


GGTCAAATGA 


AAATTCCATT 


660 


GTGTGTACAG 


ACCTCAGAGC 


TCAGGAAAGA 


TAAAAAAGAA 


TAAATAAAAC 


TCTAAACAGA 


720 


CCTTGACAAA 


ATTAATCCTA 


GAGACTGGCA 


CAGACTTACT 


TGGTACTCCT 


TCCCCTTGCC 


780 


CTATTTAGAA 


CTGAGAATAC 


TCCCTCTTGA 


TTCGGTTTTA 


CTCTTTTTAA 


GATCCTTTAT 


840 


GGG6CTCCTA 


TGCCATCACT 


GTCTTAAATG 


ATGTGTTTAA 


ACCTATGTTG 


TTATAATAAT 


900 


GATCTATATG 


TTAAGTTAAA 


AGGCTTGCAG 


GTGGTGCAGA 


AAGAAGTCTG 


GTCACAACTG 


960 


GCTACAGTGA 


ACAAGCTGGG 


TACCCCAAGG 


ACATCTTACC 


AGTTCCAGCC 


AGAGATCTGA 


1020 
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TCTACGATCC 


CCGGGTCGAC 


CCGGGTCGAC 


CCTGTGGAAT 


GTGTGTCAGT 


TAGGGTGTGG 


1080 


AAAGTCCCCA 


GGCTCCCCAG 


CAGGCAGAAG 


TATGCAAAGC 


ATGCATCTCA ATTAGTCAGC 


1140 


AACCAGGTGT 


GGAAAGTCCC 


CAGGCTCCCC 


AGCAGGCAGA 


AGTATGCAAA 


GCATGCATCT 


1200 


CAATTAGTCA 


GCAACCATAG 


TCCCGCCCCT 


AACTCCGCCC 


ATCCCGCCCC 


TAACTCCGCC 


1260 


CAGTTCCGCC 


CATTCTCCGC 


CCCATGGCTG 


ACTAATTTTT 


TTTATTTATG 


CAGAGGCCGA 


1320 


GGCCGCCTCG 


GCCTCTGAGC 


TATTCCAGAA 


GTAGTGAGGA 


GGCTTTTTTG 


GAGGCCTAGG 


1380 


CTTTTGCAAA 


AAGCTTCACG 


CTGCCGCAAG 


CACTCAGGGC 


GCAAGGGCTG 


CTAAAGGAAG 


1440 


CG6AACACGT 


AGAAAGCCAG 


TCCGCAGAAA 


CGGTGCTGAC 


CCCGGATGAA 


TGTCAGCTAC 


1500 


TGGGCTATCT 


GGACAAGGGA AAACGCAAGC 


GCAAAGAGAA AGCAGGTAGC 


TTGCAGTGGG 


1560 


CTTACATGGC 


GATAGCTAGA 


CTGGGCGGTT 


TTATGGACAG 


CAAGCGAACC 


GGAATTGCCA 


1620 


GCTGGGGCGC 


CCTCTGGTAA 


GGTTGGGAAG 


CCCTGCAAAG 


TAAACTGGAT 


GGCTTTCTTG 


1680 


CCGCCAAGGA 


TCTGATGGCG 


CAGGGGATCA 


AGATCTGATC 


AAGAGACAGG 


ATGAGGATCG 


1740 


TTTCGCATGA 


TTGAACAAGA 


TGGATTGCAC 


GCAGGTTCTC 


CGGCCGCTTG 


GGTGGAGAGG 


1800 


CTATTCGGCT 


ATGACT6G6C 


ACAACAGACA ATCGGCTGCT 


CTGATGCCGC 


CGTGTTCCGG 


1860 


CTGTCAGCGC 


AGGGGCGCCC 


GGTTCTTTTT 


GTCAAGACCG 


ACCTGTCCGG 


TGCCCTGAAT 


1920 


GAACTGCAGG 


ACGAGGCAGC 


GCGGCTATCG 


TGGCTGGCCA 


CGACGGGCGT 


TCCTTGCGCA 


1980 

^ ^ 4* 4^ 


GCTGTGCTCG 
GG6CAGGATC 


ACGTTGTCAC 
TCCTGTCATC 


TGAAGCGGGA 
TCACCTTGCT 


AGGGACTGGC 
CCTGCCGAGA 


TGCTATTGGG 
AAGTATCCAT 


CGAAGTGCCG 
CATGGCTGAT 


2040 
2100 


GCAATGCGGC 


GGCTGCATAC 


GCTTGATCCG 


GCTACCTGCC 


CATTCGACCA 


CCAAGCGAAA 


. 2160 


CATCGCATCG 


AGCGAGCACG 


TACTCGGATG 


GAAGCCGGTC 


TTGTCGATCA 


GGATGATCTG 


2220 


GACGAAGAGC 


ATCAGGGGCT 


CGCGCCAGCC 


GAACTGTTCG 


CCAGGCTCAA 


GGCGCGCATG 


2280 


CCCGACGGCG 


AGGATCTCGT 


CGTGACCCAT 


GGCGAT6CCT 


GCTTGCCGAA 


TATCATGGTG 


2340 


6AAAATGGCC 


6CTTTTCTGG 


ATTCATCGAC 


TGTGGCCGGC 


TGGGTGTGGC 


GGACCGCTAT 


2400 


CAGGACATAG 


CGTTGGCTAC 


CCGTGATATT 


GCTGAAGAGC 


TTGGCGGCGA 


ATGGGCTGAC 


2460 


CGCTTCCTCG 


TGCTTTACGG 


TATCGCCGCT 


CCCGATTCGC 


AGCGCATCGC 


CTTCTATCGC 


2520 


CTTCTTGACG 


AGTTCTTCTG 


AGCGGGACTC 


TGGGGTTCGA AATGACCGAC 


CAAGCGACGC 


2580 


CCAACCTGCC 


ATCACGAGAT 


TTCGATTCCA 


CCGCCGCCTT 


CTATGAAAGG 


TTGGGCTTCG 


2640 


GAATCGTTTT 


CCGGGACGGA ATTCGTAATC 


TGCTGCTTGC 


AAACAAAAAA ACCACCGCTA 


2700 


CCAGCG6TGG 


TTTGTTTGCC 


GGATCAAGAG 


CTACCAACTC 


TTTTTCCGAA 


GGTAACTGGC 


2760 


TTCAGCAGAG 


CGCAGATACC 


AAATACTGTC 


CTTCTAGTGT 


AGCCGTAGTT 


AGGCCACCAC 


2820 


TTCAAGAACT 


CTGTAGCACC 


GCCTACATAC 


CTCGCTCTGC 


TAATCCTGTT 


ACCAGTGGCT 


2880 


GCTGCCAGTG 


GCGATAAGTC 


GTGTCTTACC 


GGGTTGGACT 


CAAGACGATA 


GTTACCGGAT 


2940 


AAGGCGCAGC 


GGTCGGGCTG 


AACGGGGGGT 


TCGTGCACAC 


AGCCCAGCTT 


GGAGCGAACG 


3000 
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ACCTACACCG 


AACTGAGATA 


CCTACAGCGT 


GAGCATTGAG 


AAAGPCPPUP 


GPTTPPPGAA 




GGGAGAAAGG 


PGCACAGGTA 


TCCGGTAAGC 


GGCAGGGTCG 


GAAP AGG2XG A 


GPGPAPGAGG 




GA6CTTCCAG 


GGGGAAACGC 


CTGGTATCTT 


TATAGTCCTG 


TCGGGTTTPG 


rPACPTPTGA 




CTTGAGCGTC 


GATTTTTGTG 


ATGCTCGTCA 


GGGGGGCGGA 


GCCTATGGAA 


AAACGCCAGC 


3240 


AACGCCGAGA 


TGCGCCGCCT 


CGAGTACACC 


TGCGTCATGC 


TGAGACCCTC 


AAGCCTCACT 


3300 


AAAAGGGTCC 


CTGCCTAGTT 


CTGTTTACTA ATCTGCCTTA 


TTCTGTTTTT 


GTTCCCATGT 

WX X^WXM^XwX 


3360 


TAAAGATAGA 


GTAAATGCAG 


TATTCTCCAC 


ATAGAGATAT 


AGACTTCTGA 


AAT T CT AAG A 


3420 

V *a fc» w 


TTAGAATTAT 


TTACAAGAAG 


AAGT6G6GAA 


TGAAGAATAA 


AAAATTAPTG 
ivmn X X nw x \3 


GCCTCT TGTC 
\9W»Vi« X X X ^^ X \j 


34fiO 


AGAACATGAA 


PTTTPAPPTC 
v<* X X X \^n\^\* X w 


GGAGCCCACC 


CCCTCCCATC 


TGG&AAAP&T 
X uVaAAAA^ A X 


APTTGAGAAA 
AL»X XVsAUAAA 




AAPATTTTPT 

/VlWii 1 1 XVrl 


ft^AAPAAPPA 


CAGAATGTTT 


CAACAGGCCA 


uAxu XAX XoV# 


PIX 2i ZiPUPA^C 
UAAA^AWiUU 


JOUU 


IiTAT/2APTPT 


X X X X VaAu X 


AAATTTGTGG 


TTGTTAAACT 


TPPPPT aTTP 


PPTPPPPIiTT 


JOOU 


CCCCCTCCCA 


GTTTGTGGTT 


TTTTCCTTTA 


AAAGCTTGTG 


AAAAATTTGA 


GTCGTCGTCG 


3720 


AGACTCCTCT 


ACCCTGTGCA 


AAGGTGTATG 


AGTTTCGACC 


CCAGAGCTCT 


GTGTGCTTTC 


3780 


TGTTGCTGCT 


TTATTTCGAC 


CCCAGAGCTC 


TGGTCTGTGT 


GCTTTCATGT 


CGCTGCTTTA 


3840 


TTAAATCTTA 


CCTTCTACAT 


TTTATGTATG 


GTCTCAGTGT 


CTTCTTGGGT 


t 

ACGCGGCTGT 


3900 


CCCGGGACTT 


GAGTGTCTGA 


GTGAGGGTCT 


TCCCTCGAGG 


GTCTTTCATT 


TGGTACATGG 


3960 


6CCGGGAATT 

1 i vWil 1 13 


CGAGAATCTT 

CPP^nnA A AP 


TCATTTGGTG 
AGCGCGACCA 


CATTGGCCGG 
CCCAGAGGTC 


GAATTCGAAA 

PT 11^211 PPPUP 
W X AuA^^V.* AV« 


ATCTTTCATT 

TT AG AGGT A A 
1 XA\3Au\9XAA 


4020 




X\.fXUXXXX 


TCTGATGTCT 


GTGTTCTGAT 


P T TP 
UXV.«XV3XoX XV> 


X O X X X V« X AAV3 


4140 


X ^ X V3V3 X UV^Url 


TPGPAGTTTP 


AGTTTTGCGG 


ACGCTCAGTG 


AGAPPGPGPT 


PPGAGAGGGA 


4200 


O X Ov«V3V3V9V9 X V9 


GATAAGGATA 


GACGTGTCCA 


GGTGTCCACC 


GTPPGTTPGP 


PPTGGGAGAP 


4260 


GTPPrAGGAG 


GAAPAGGGGA 


GGATCA6GGA 


CGCCTGGTGG 


APPPPTTTGA 


AGGPPAAGAG 


4320 


APPATTTRGf; 
n\*\*n,x X X ui9\3 


GTTGPGAGAT* 


CGTGGGTTCG 




PGTGPPPAGT 


TGPGAGATPG 




TfififZT'or'fi'hn. 

1 UUU X i Avi 




TGTTTTGTTG 


CGAGATCGTG 


*p*ppdi rsTP 
uu 1 1 wuAo 1 




4440 






TTCGAGTCCC 


ACCTCGTGTT 


TTPTTPPPaP 


» TPPTP/^/STT 


A >son 






GTCACGGGAT 


CGTGGGTTCG 


Au i L.LUAUC 1 


UV9 1 uCAbAulj 


4 0 0U 






GAGAGGCCAT 


CTGATTCTTC 


111 ^rt rtl rt^ #P 


X X 1 X VaXUx XA 




b X VJl Lvs xbxC 




GTGACTACTG 


TTTTTCTAAA 


AAi GGGACAA 


xUTbXbxLCA 




CTCCCCTTTC 


TCTGACTCTG 


GTTCTGTCGC 


TTGGTAATTT 


TGTTTGTTTA 


CGTTTGTTTT 


4740 


TGTGAGTCGT 


CTATGTTGTC 


TGTTACTATC 


TTGTTTTTGT 


TTGTGGTTTA 


CGGTTTCTGT 


4800 


GTGTGTCTTG 


TGTGTCTCTT 


TGTGTTCAGA 


CTTGGACTGA 


TGACTGACGA 


CTGTTTTTAA 


4860 


GTTATGCCTT 


CTAAAATAAG 


CCTAAAAATC 


CTGTCAGATC 


CCTATGCTGA 


CCACTTCCTT 


4920 


TCAGATCAAC 


AGCT6CCCTT 


ACGTATCGAT 


GGATCCGA 






4958 


(2) INFORMATION FOR SEQ ID NO: 21 


• 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7080 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOEDNESS: single 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



GAATACAAGC 


TTGCATGCCT 






PTTGAArSAAT 


AAAAAATTAP 




TGGCCTCTTG 


TGAGAACATG 


aapTTTr'apr* 

nnx^ 111 ^^AkoVir 


TPfifiAfiPPPA 




TPTnriAAAAP 


1 90 


ATACTTGAGA AAAACATTTT 


PT cnn aP2i Af 


P AP A ATCP 


TTPAAPAfifiP 
1 1 WAA^ AuvVi* 


PACATi^TATT 
V«nuAlV9lAl 1 


1 AO 


GCCAAACACA 


GGATATGACT 


v,^ i 1 i 1 1 VsA 


ulAAAl 1 lul 


ouX iul lAAA 


PTTPPPPTAT 




TCCCTCCCCA 


TTCCCCCTCC 


PJUPTTTPTPP 
WiO i 1 1 U 1 


wi wi fw frt ^rt ^> ftt flfl 

111111 \*\^ 1 1 


TBaaappTTP 

lAAAAuCl lu 


A A A a B *PTT 
1 uAAAAAl 1 X 




GAGTCGTCGT 


CGAGACTCCT 


\^ i /i\.»L>L* lulu 


^,#/^nnv3U 1 u 1 A 


TPaPTTTPPa 
1 oAo 111 UuA 


PPPP TV <^ a P PT* 




CTGTGTGCTT 


TCTGTTGCTG 


L^lll/illl \^\3 






ulu^iol 1 IV-^nl 


490 
4 ^ u 


GTCGCTGCTT 


TATTAAATCT 


TaPPTTPTUP 


Al 1 1 lAllalA 


1 UU 1^1 wAu 1 


\3l\^X 1V«1 1 UU 


4 RO 


6TACGCGGCT 


GTCCCGGGAC 


1 1 u Ao 1 u i W 1 


VaAu 1 oAVtuu 1 


PTTPPPTPflA 
I 1 ^uA 


Cfi/?TPTTTPA 
UUl3 X V« 1 1 1 V-»ri 


(\40 

34 U 


TTTGGTACAT 


GGGCCGGGAA 


TTCGAGAATC 


TTTCATTTGG 


1 vjWAI i u\aV/\..« 


A A TTP^ A 
\3V9\aAAX X\«V3A 


^00 


AAATCTTTCA 


TTTGGTGCAT 


TGGCCGGGAA 


ACAGCGCGAC 


CACCCAGAGG 




otou 


ACTTAGAGGT 


AAGATTCTTT 


GTTCTGTTTT 


GGTCTGATGT 


CTGTGTTCTG 


AluXLoX VaXuX 




TCTGTTTCTA 


AGTCTGGTGC 


GATCGCAGTT 


TCAGTTTTGC 


GGACGCTCAG 


TGaGACCGCG 


ion 


CTCCGAGAGG 


GAGTGCGGGG 


TGGATAAGGA 


TAGACGTGTC 


CAGGTGTCCA 


CCGTCCGTTC 


840 


GCCCTGGGAG 


ACGTCCCAGG 


AGGAACAGGG 


GAGGATCAGG 


GACGCCTGGT 


GGACCCCTTT 


900 


GAAGGCCAAG 


AGACCATTTG 


GGGTTGCGA6 


ATCGTGGGTT 


CGAGTCCCAC 


CATCGATGGT 


960 


GCAGAGGGTC 


TCAATTGGCC 


GGCCTTAGAA 


TTACGGATCT 


AGCATGATTG 


AACAAGATGG 


1020 


ATTGCACGCA GGTTCTCCGG 


CCGCTTGGGT 


GGAGAGGCTA 


TTCGGCTATG 


ACTGGGCACA 


1080 


ACAGACAATC 


GGCTGCTCTG 


ATGCCGCCGT 


GTTCCGGCTG 


TCAGCGCAGG 


GGCGCCCGGT 


1140 


TCTTTTTGTC 


AAGACCGACC 


TGTCCGGTGC 


CCTGAATGAA 


CTGCAGGACG 


AGGCAGCGCG 


1200 


GCTATCGTGG 


CTGGCCACGA 


CGGGCGTTCC 


TTGCGCAGCT 


GTGCTCGACG 


TTGTCACTGA 


1260 


AGCGGGAAGG 


GACTGGCTGC 


TATTGGGCGA AGT6CCGGGG 


CAGGATCTCC 


TGTCATCTCA 


1320 


CCTTGCTCCT 


GCCGAGAAA6 


TATCCATCAT 


GGCTGATGCA ATGCGGCGGC 


TGCATACGCT 


1380 


TGATCCGGCT 


ACCTGCCCAT 


TCGACCACCA AGCGAAACAT 


CGCATCGAGC 


GAGCACGTAC 


1440 


TCGGATGGAA GCCGGTCTTG 


TCGATCAGGA 


TGATCTGGAC 


GAAGAGCATC 


AGGGGCTCGC 


1500 


GCCAGCCGAA 


CTGTTCGCCA 


GGCTCAAGGC 


GCGCATGCCC 


GACGGCGAGG 


ATCTCGTCGT 


1560 
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GACCCATGGC GATGCCTGCT TGCCGAATAT 



CATCGACTGT GGCCGGCTGG GTGTGGCGGA 
5 TGATATTGCT GAAGAGCTTG GCGGCGAATG 



CGCCGCTCCC GATTCGCAGC GCATCGCCTT 



GGGACTCTGG GGTTCGTAAT GACCGACCAA 

10 

GATTCCACCG CCGCCTTCTA TGAAAGGTTG 



TTGTCAATTC TATTATTTCA ATACAGAACA 
15 TTGTATATTA TGATTGTCCC TCGAACCATG 



TCTGTCATCT GCCAGGCCAT T7U\GTTATTC 



CATATCATAA ACACATTTGA AATTGAGTAT 

20 

TGTATCCTCA GAAAAAAAGT TTGTTATAAA 



ATATTCCAGC TATAGGAAAG AAAGTGCGTC 
25 TCACATGCAT GCTTCTTTAT TTCTCCTATT 



TCTCACTTAT GTCCTGCCTA GCATGGCTCA 



ATGAAACAGA CTTCTGGTCT GTTACTACAA 

30 

GCTAATTATG TTTTCCATCT CTAAGGTTCC 
TATCTGGTTG TAACTGAAGC TCAATGGAAC 

TCCAACAGTC CTGATGGATT AGCAGAACAG 

35 

AACTAATATT TGCTCTCCAT TCAATCCAAA 



ATCCCATTAA ATGATTTCTA TGGCGTCAAA 
40 GGTCACAATT CAGGCTATAT ATTCCCCAGG 



ATCCTGTGGA CAGCTCACCT AGCTGCAATG 



GCTTTTGGCC TGCTCTGCCT GCCCTGGCTT 

45 

TTATCCAGGC TTTTTGACAA CGCTATGCTC 



GACACCTACC AGGAGTTTGA AGAAGCCTAT 
50 CAGAACCCCC AGACCTCCCT CTGTTTCTCA 



GAAACACAAC AGAAATCCAA CCTAGAGCTG 



TGGCTGGAGC CCGTGCAGTT CCTCAGGAGT 

55 

TCTGACAGCA ACGTCTATGA CCTCCTAAAG 



GGGAGGCTGG AAGATGGCAG CCCCCGGACT 
60 TTCGACAC7UV ACTCACACAA CGATGACGCA 



TTCAGGAAGG ACATGGACAA GGTCGAGACA 



GAGGGCAGCT GTGGCTTCTA GCTGCCCGGG 

65 

TCCTGGCCCT GGAAGTTGCC ACTCCAGTGC 



CATGGTGGAA 


AATGGPPGPT 
nnx u\3w^^9w X 


X X x^ivavsAX Jl 




CCGCTATCAG 


GACATAGCGT 
wAwn X jn\jf\»\9 X 


TGGCTACPCG 

X wV3Vi« X XltWwV* w 


X wOV 


GGCTGACCGC 


TTCCTCGTGC 


TTTACGGTAT 

X X xriiwwwxrix 


1740 


CTATCGCCTT 


CTTGACGAGT 


TCTTCTGAGC 


1800 

«b V w V 


GCGACGCCCA 


ACCTGCCATC 

• iw W X W WW«bX w 


ACGAGATTTC 


1860 


GGCTTCGGAG 


TTAGCTTGTT 

A X nss\mr X X W X X 


TCTTTACTGT 

X w XXX «*W X w X 


1920 


ATAGCTTCTA 


TAACTGAAAT 

X w X wXVXT* X 


AT AT T T GC T A 

r%x«»x X xwwxf^ 


1980 


AACACTCCTC 


CAGCTGAATT 

wAww X wn^nx X 


TCACAATTCC 

X wirVwinnX X WW 


2040 


ATGGAAGATC 


TTTGACGAAC 
XXX \3nuunnw 


ACTGCAAGTT 
X w w^v%w X X 


2100 

^ X w 


TGTTTTRPAT 


TGTATGGAGC 


TATGTTTTGC 


2160 


GCATTCACAC 


CCATAAAAAG 


ATAGATTTAA 


2220 


TGCTCTTCAC 


TCTAGTCTCA 


GTTGGCTCCT 


2280 


TTGTCAAGAA 


AATAATAGGT 


CACGTCTTGT 


2340 


GATGCACGTT 


GTAGATACAA 


GAAGGATCAA 


2400 


CCATAGTAAT 


AAGCACACTA ACTAATAATT 


2460 


CACATTTTTC 


TGTTTTCTTA AAGATCCCAT 
TTTCCCAGTC TTCTCTCCCA 


2520 




CATTGTTACC 


CAGAATTAAA 


2640 


ATGGACCTAT 


TGAAACTAAA ATCTAACCCA 


2700 


GGTCAAACTT 


CTGAAGGGAA 


CCTGTGGGTG 


2760 


GCTCAGCCAG 


TGTCTGTACA 


TACACAACGG 


2820 


GCTACAGGCT 
WW X nwnw w w X 


CCCGGACGTC 


CCTGCTCCTG 


2880 

V W V 


PAAGAGGGCA 


GTGCCTTCCC 


AACCATTCCC 


2940 




GTCT6CACCA 


GCTGGCCTTT 




JiTPCP ZL & 21 


AACAGAAGTA 


TTCATTCCTG 




vsAlalN^inx iVo 


CGACACCCTC 


CAACAGGGAG 


.^X^U 


L> 1 V^LrULrAi Nrf 1 


CCCTGCTGCT 


CATCCAGTCG 


JXOU 


GTCTTCbCCA 


ACAGCCTGGT 


GTACGGCGCC 


■? O >1 ft 


GACCTAGAGG 


AAGGCATCCA 


AACGCTGATG 


3300 


GGGCAGATCT 


TCAAGCAGAC 


CTACAGCAAG 


3360 


CTACTCAAGA 


ACTACGGGCT 


GCTCTACTGC 


3420 


TTCCTGCGCA 


TCGTGCAGTG 


CCGCTCTGTG 


3480 


TGGCATCCTG 


TGACCCCTCC 


CCAGTGCCTC 


3540 


CCACCAGCCT 


TGTCCTAATA 


AAATTAAGTT 


3600 
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GCATCAAAAA AAAAAAAAAG CTAGCGGCCG 
ATTTAC7UVGA A6AAGTGGGG AATGAAGAAT 

5 

AACTTTCACC TCGGAGCCCA CCCCCTCCCA 
CTGGAACAAC CACAGAATGT TTCAACAGGC 
10 CTTTGGTTGA GTAAATTTGT GGTTGTTAAA 
CAGTTTGTGG TTTTTTCCTT TAAAAGCTTG 
CTACCCTGTG CAAAGGTGTA TGAGTTTCGA 

15 

CTTTATTTCG ACCCCAGAGC TCTGGTCTGT 
TACCTTCTAC ATTTTATGTA TGGTCTCAGT 
20 TTGAGTGTCT GAGTGAGGGT CTTCCCTCGA 
TTCGAGAATC TTTCATTTGG TGCATTGGCC 
TACCGAGCTC GAATTCCGGT CTCCCTATAG 

25 

CATTAATGAA TCGGCCAACG CGCGGGGAGA 

TCCTCGCTCA CTGACTCGCT GCGCTCGGTC 

30 TCAAAGGCGG TAATACGGTT ATCCACAGAA 
GCAAAAGGCC AGCAAAAGGC CAGGAACCGT 

AGGCTCCGCC CCCCTGACGA GCATCACAAA 

35 CCGACAGGAC TATAAAGATA CCAGGCGTTT 

GTTCCGACCC TGCCGCTTAC CGGATACCTG 

CTTTCTCATA GCTCACGCTG TAGGTATCTC 

40 

GGCTGTGTGC ACGAACCCCC CGTTCAGCCC 
CTTGAGTCCA ACCCGGTAAG ACACGACTTA 
45 ATTAGCAGAG C6AGGTATGT AGGCGGTGCT 
GGCTACACTA GAAGGACAGT ATTTGGTATC 
AAAAGAGTTG GTAGCTCTTG ATCCGGCAAA 

50 

GTTTGCAAGC AGCAGATTAC GCGCAGAAAA 
TCTACGGGGT CTGACGCTCA GTGGAACGAA 
55 TTATCAAAAA GGATCTTCAC CTAGATCCTT 
TAAAGTATAT ATGAGTAAAC TTGGTCTGAC 
ATCTCAGCGA TCT6TCTATT TCGTTCATCC 

60 

ACTACGATAC GGGAGGGCTT ACCATCTGGC 
CGCTCACCGG CTCCAGATTT ATCAGCAATA 
65 AGTGGTCCTG CAACTTTATC CGCCTCCATC 



CTAGACTTCT 


GAAATTCTAA 


GATTAGAATT 


3660 


AAAAAATTAC 


TGGCCTCTTG 


TGAGAACATG 


3720 


TCTGGAAAAC 


ATACTTGAGA 


AAAACATTTT 


3780 


CAGATGTATT 


GCCAAACACA 


GGATATGACT 


3840 


CTTCCCCTAT 


TCCCTCCCCA 


TTCCCCCTCC 


3900 


TGAAAAATTT 


GAGTCGTCGT 


CGAGACTCCT 


3960 


CCCCAGAGCT 


CTGTGTGCTT 


TCTGTTGCTG 


4020 


GTGCTTTCAT 


GTCGCT6CTT 


TATTAAATCT 


4080 


GTCTTCTTGG 


GTACGCGGCT 


GTCCCGGGAC 


4140 


GGGTCTTTCA 


TTTGGTACAT 


GGGCCGGGAA 


4200 

m mm ^0 W 


GGGAATTCGA 


AAATCTTTCA 


GATCCCCGGG 


4260 


TGAGTCGTAT 


TAATTTCGAT 


AAGCCAGCTG 


4320 


GGCGGTTTGC 


GTATTGGGCG 


CTCTTCCGCT 


4380 


GTTCGGCTGC 


GGCGAGCGGT 


ATCAGCTCAC 


4440 


TCAGGGGATA ACGCAGGAAA 
AAAAAGGCCG CGTTGCTGGC 


GAACATGTGA 
GTTTTTCCAT 


4500 
4560 


AATCGACGCT 


CAAGTCAGAG 


GTGGCGAAAC 


4620 


CCCCCTGGAA 


GCTCCCTCGT 


GCGCTCTCCT 


4680 


TCCGCCTTTC 


TCCCTTCGGG 


AAGCGTGGCG 


4740 


AGTTCGGTGT 


AGGTCGTTCG 


CTCCAAGCTG 


4800 


GACCGCTGCG 


CCTTATCCGG 


TAACTATCGT 


4860 


TCGCCACTGG 


CAGCAGCCAC 


TGGTAACAGG 


4 920 


ACAGAGTTCT 


TGAAGTGGTG 


GCCTAACTAC 


4980 


TGCGCTCTGC 


TGAAGCCAGT 


TACCTTCGGA 


5040 


CAAACCACCG 


CTGGTAGCGG 


TGGTTTTTTT 


5100 


AAAGGATCTC 


AAGAAGATCC 


TTTGATCTTT 


5160 


AACTCACGTT 


AAGGGATTTT 


GGTCATGAGA 


5220 


TTAAATTAAA AATGAAGTTT 


TAAATCAATC 


5280 


AGTTACCAAT 


GCTTAATCAG 


TGAGGCACCT 


5340 


ATAGTT6CCT 


GACTCCCCGT 


CGTGTAGATA 


5400 


CCCAGTGCTG 


CAATGATACC 


GCGAGACCCA 


5460 


AACCAGCCAG 


CCGGAAGGGC 


CGAGCGCAGA 


5520 


CAGTCTATTA 


ATTGTTGCCG 


GGAAGCTAGA 


5580 
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GTAAGTAGTT 


CGCCAGTTAA 


TAGTTTGCGC 


AACGTTGTTG 


w wrV X X X 


AGGCATCGTG 


5640 


GTGTCACGCT 


CGTCGTTTGG 

A ^^^^ AAA 


TATGGCTTCA 


TTCAGCTCCG 


GTTCCCAACG 

W A A WW wm«ww 


ATCAAGGCGA 


5700 


GTTACATGAT 


CCCCCATGTT 


GTGC/JU\AAA 


GCGGTTAGCT 


CCTTCGGTCC 

WW A A www A WW 


TCCGATCGTT 


5760 


GTCAGAAGTA 


AGTTGGCCGC 


AGTGTTATCA 


CTCATGGTTA 


TGGCAGCACT 


GCATAATTCT 


5820 




TGCCATCCGT 


AAGATGCTTT 


TCTGTGACTG 


GTGAGTACTC 

XJ X VS^«*\J X J* V# X x# 


AACCAAGTCA 


5880 


TTCTGAGAAT 


AGTGTATGCG 


GCGACCGAGT 


TGCTCTTGCC 


CGGCGTCAAT 

WW www i W«U« A 


ACGGGATAAT 


5940 


ACCGCGCCAC 


ATAGCAGAAC 


TTTAAAAGTG 

AAA «VW»W A W 


CTCATCATTG 

W A wfl A w«» A A w 


GAAAACGTTC 

wnnTmV^w A A w 


TTCGGGGCGA 


6000 


AAACTCTCAA 


GGATCTTACC 


GCTGTTGAGA 


TCCAGTTCGA 

AwW«%wA A ww«l 


TGTAACCCAr 


TCGTGCACCC 


6060 


AACTGATCTT 


CAGCATCTTT 


TACTTTCACC 

A J^W AAA WaXw W 


AGCGTTTCTG 

n w w w AAA w A w 


GGTGAGPAAA 

WW A wnw V«Ann 


AACAGGAAGG 


6120 


CAAAATGCC6 


CAAAAAA6GG 


AATAAGGGCG 


ACACGGAAAT 


GTTGAATACT 

w A A w/V^ A «%w A 


CATACTCTTC 


6180 


CTTTTTCAAT 


ATTATTGAAG 

A A A A 


CATTTATCAG 

w*» AAA <• A W**W 


GGTTATTGTC 

WW A A A A w A w 


TCATGAGCGG 

A w«X A wX^ W W WW 


ATACATATTT 


6240 


GAATGTATTT 


AGAAAAATAA 


ACAAATAGGG 


GTTCCGCGCA 


CATTTCCCCG 


AAAAGTGCCA 


6300 


CCTGACGTCT 


AAGAAACCAT 


TATTATCATG 


ACATTAACCT 


ATAAAAATAG 


GCGTATCACG 


6360 


AGGCCCTTTC 


GTCTCGCGCG 


TTTCGGTGAT 


GACGGTGAAA 


ACCTCTGACA 


CATGCAGCTC 


6420 


CCGGAGACGG 


TCACAGCTTG 
GTGTTGC3CGG 

\3 A \J A A WW wwU 


TCTGTAAGCG 
GTGTCGGGGC 

\9 A w A www^9 WW 


GATGCCGGGA 
TGGCTTAACT 


GCAGACAAGC 
ATGCGGCATC 


CCGTCAGGGC 
AGAGCAGATT 


6480 
6540 




TGCACCATAT 

X V WAw A n A 


CGACGCTCTC 

w wJ^w w w A w A w 


CCTTATGCGA CTCCTGCATT 


AGGAAGCAGC 


6600 


CCAGTAGTAG 


GTTGAGGCCG 


TTGAGCACCG 

A A WakWWjniWWW 


CCGCCGCAAG 


GAATGGTGCA 


AGGAGATGGC 


6660 


GCCCAACAGT 


CCCCCGGCCA 

w w w wX9^« w«l 


CGGGGCCTGC 

WW WW WW W A WW 


CACCATACCC 


ACGCCGAAAC 


AAGCGCTCAT 


6720 


GAGCCCGAAG 


TGGCGAGCCC 

A w wwwJ^wwww 


GATCTTCCCC 

wC» A W A A Www W 


ATCGGTGATG 


TCGGCGATAT 


AGGCGCCAGC 


6780 


AACCGCACCT 

• V»WW W WClW w X 


GTGGCGCCGG 

W A WW WW WW WW 


TGATGCCGGC 

A a WW WW WW 


CACGATGCGT 


CCGGCGTAGA 


GGATCTGGCT 


6840 


AGCGATGACC 


CTGCTGATTG 


GTTCGCTGAC 


CATTTCCGGG 


GTGCGGAACG 


GCGTTACCAG 


6900 


AAACTCAGAA 


GGTTCGTCCA 


ACCAAACCGA 


CTCTGACGGC 


AGTTTACGAG 


AGAGATGATA 


6960 


GGGTCTGCTT 


CAGTAAGCCA 


GATGCTACAC 


AATTAGGCTT 


GTACATATTG 


TCGTTAGAAC 


7020 


GC66CTACAA 


TTAATACATA 


ACCTTATGTA 


TCATACACAT 


ACGATTTA6G 


TGACACTATA 


7080 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6795 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDMESS : single 

(D) TOPOLOGY: linear 



60 



(ii) MOLECULE TYPE: DNA (genomic) 



65 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
AATGAAAGAC CCCACCTGTA GGTTTGGCAA GCTAGCTTAA GTAACGCCAT 



TTTGCAAGGC 



60 
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ATGGAAAAAT ACATAACTGA GAATAGAGAA 



AGCTGAATAT GGGCCAAACA GGATATCTGT 

5 

AAGAACAGAT GGAACAGCTG AATATGGGCC 



CCCCGGCTCA GGGCCAAGAA CAGATGGTCC 
10 AGAGAACCAT CAGATGTTTC CAGGGTGCCC 



TGAACTAACC AATCAGTTCG CTTCTCGCTT 



ATAAAAGAGC CCACAACCCC TCACTCGGGG 

15 

GGTACCCGTG TATCCAATAA ACCCTCTTGC 



CTTGGGAGGG TCTCCTCTGA GTGATTGACT 
20 CTCGTCCGGG ATCGGGAGAC CCCTGCCCAG 



TGGCCAGCAA CTTATCTGTG TCTGTCC6AT 



CTGCGTC6GT ACTAGTTAGC TAACTAGCTC 

25 

GAGTTCGGAA CACCCGGCCG CAACCCTGGG 



GGGACGCCTG GTGGACCCCT TTGAAGGCCA 
TTCGAGTCCC ACCTCGTGCC CAGTTGCGAG 

30 

GTTGCGAGAT CGTGGGTTCG AGTCCCACCT 



TCCCACCTCG TGTTTTGTTG CGAGATCGTG 
35 GGATCGTGGG TTCGAGTCCC ACCTCGTGCA 



CCATCTGATT CTTCTGGTTT CTCTTTTTGT 



ACTGTTTTTC TAAAAATGGG ACAATCTGTG 

40 

TCGCTTGGTA ATTTTGTTTG TTXACGTTTG 



TATCXTGTTT TTGTTTGTGG TTTACGGTTT 
45 CAGACTTGGA CTGATGACTG ACGACTGTTT 



AATCCTGTCA GATCCCTATG CTGACCACTT 



GCTCAAGCTT CGAATTCTGC AGTCGACGGT 

50 

CAAGGTACGT AGCGGGGATC AATTCCGCCC 



GGAATAAGGC CGGTGTGCGT TTGTCTATAT 
55 CAATGT6AGG GCCCGGAAAC CTGGCCCTGT 



CCCTCTCGCC AAAGGAATGC AAGGTCTGTT 



AGCTTCTTGA AGACAAACAA CGTCTGTA6C 

60 

TGGCGACAGG TGCCTCTGCG GCCAAAAGCC 



ACAACCCCAG TGCCACGTTG TGAGTTGGAT 
65 AAGCGTATTC AACAAGGGGC TGAAGGATGC 



GTTCAGATCA 


AGGTCAGGAA 


CAGATGGAAC 


120 


GGTAAGCAGT 


TCCTGCCCCG 


GCTCAGGGCC 


180 


AAACAGGATA 


TCTGTGGTAA 


GCAGTTCCTG 


240 


CCAGATGCGG 


TCCAGCCCTC 


AGCAGTTTCT 


300 


CAAGGACCTG 


AAATGACCCT 


GTGCCTTATT 


360 


CTGTTCGCGC 


GCTTCTGCTC 


CCCGAGCTCA 


420 


CGCCAGTCCT 


CCGATTGACT 


GAGTCGCCCG 


480 


AGTTGCATCG 


GACTTGTGGT 


CTCGCTGTTC 


540 


ACCCGTCAGC 


GGGGGTCTTT 


CATTTGGGGG 


600 


GGACCACCGA 


CCCACCACCG 


GGAGGTAAGC 


660 


TGTCTAGTGT 


CTATGACTGA 


TTTTATGCGC 


720 


TGTATCT6GC 


GGACCCGTGG 


TGGAACTGAC 


780 


AGACGTCCCA 


GGAGGAACAG 


GGGAGGATCA 


840 


AGAGACCATT 
ATCGTGGGTT 


TGGGGTTGCG 
CGAGTCCCAC 


AGATCGTGGG 
CTCGTGTTTT 


900 
960 


CGCGTCTGGT 


CACGGGATCG 


TGGGTTCGAG 


1020 


GGTTCGAGTC 


CCACCTCGCG 


TCTGGTCACG 


1080 


GAGGGTCTCA 


ATTGGCCGGC 


CTTAGAGAGG 


1140 


CTTAGTCTCG 


TGTCCGCTCT 


TGTTGTGACT 


1200 


TCCACTCCCC 


TTTCTCTGAC 


TCTGGTTCTG 


1260 


TTTTTGTGAG 


TCGTCTATGT 


TGTCTGTTAC 


1320 


CTGTGTGTGT 


CTTGTGTGTC 


TCTTTGTGTT 


1380 


TTAAGTTATG 


CCTTCTAAAA 


TAAGCCTAAA 


1440 


CCTTTCAGAT 


CAACAGCTGC 


CCTTACTC6A 


1500 


ACCGCGGCCG 


CTAACTAATA 


GCCCATTCTC 


1560 


CCCCCCTAAC 


GTTACTGGCC 


GAAGCCGCTT 


1620 


GTTATTTTCC 


ACCATATTGC 


CGTCTTTTGG 


1680 

A W W V 


CTTCTTGACG 


AGCATTCCTA 


GGGGTCTTTC 


1740 


GAATGTCGTG 


AAGGAAGCAG 


TTCCTCTGGA 


1800 


GACCCTTTGC 


A6GCAGCGGA ACCCCCCACC 


1860 


ACGTGTATAA 


GATACACCTG 


CAAAGGCGGC 


1920 


AGTTGTGGAA 


AGAGTCAAAT 


GGCTCTCCTC 


1980 


CCAGAAGGTA 


CCCCATTGTA 


TGGGATCTGA 


2040 
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V^uij 1 O^A^Ai 




IbX i lAoiCb 


AGGTTAAAAA AACGTCTAGG 


2100 




uAUul3buAii,«u 




1 iUAAAAAv^A 


CGATACGGGA 


TCCACCGGTC 


2160 




\9 1 AAAouAuA 


AuAAu i 1 1 1 U 


AUAGuAvjTTG 


TCCCAATTCT 


TGTTGAATTA 


f> r\ r\ f\ 

2220 






L»AAAi 1 i iUi 


vjTCAVaTbbAG 


AGGGTGAAGG 


TGATGCAACA 


O O O A 

2280 


♦p 71 r*f*r* R 71 R IV ^ 
TACUoAAAAC 


a TACCU TTAA 


An lAi 1 IbL. 


A A /^n*r*r* a a 
AL 1 Ad GGAA 


AACTACCTGT 


TCCATGGCCA 


2340 


ACACT 1 bTCA 


LTACTl a LAU 


1 XAru^albii 


CAArbC 1x1 T 


CAAGATACCC 


AGATCATATG 


2400 


AAAUbVjUATb 


AL r 1 1 i 1 C AA 


bAw 1 ub wi 1 \j 


r^r^r^f^ a A/^/^fPT 
uVt^ubAAbb i 1 


ATGTACAGGA 


AAGAACTATA 


24 oU 


1 1 1 i 1 u AAAls 


Ai uAUuuoAA 


v.* i. AL«AA\aAL>A 


Ifb 1 X bAAb 


TCAAGTTTGA 


AGGTGATACC 


O C 1 A 

2b2U 


CTTGTTAATA 


GAATCGAGTT 


AAAAGGTATT 


GATTTTAAAG 


AAGATGGAAA 


CATTCTTGGA 


2580 


CACAAATTGG 


AATACAACTA 


TAACTCACAC 


AATGTATACA 


TCATGGCAGA 


CAAACAAAAG 


2640 


AATGGAACCA 


AAGTTAACTT 


CAAAATTAGA 


CACAACATTG 


AAGATGGAAG 


CGTTCAACTA 


2700 


GCAGACCATT 


ATCAACAAAA 


TACTCCAATT 


GGCGATGGCC 


CTGTCCTTTT 


ACCAGACAAC 


2760 


CATTACCTGT 


CCACACAATC 


TGCCCTTTCG 


AAAGATCCCA 


ACGAAAAGAG 


AGACCACATG 


2820 


GTCCTTCTTG 
TCCGGATCTA 


AGTTTGTAAC 
GATAACTGTA 


AGCTGCTGGG 
TCGATGGATC 


ATTACACATG 
CGAAGGCGGG 


GCATGGATGA 
GACAGCAGTG 


ACTATACAAG 
CAGTGGTGGA 


2880 
2940 


CAGAAaGCAA 


GTGATCTAGG 


CCAGCAGCCT 


CCCTAAAGGG 


ACTTCAGCCC 


ACAAAGCCAA 


3000 


ACTTGTGGCT 


TTAATACAAG 


CTCTGTAAAT 


^^mTV A A A A TV TV 

GGTAAAAAAA 


AAAAAGTCTA 


CACGGACAGC 


3060 


AGGTATGCTC 


TTGCCACTGT 


IV ^ IV ^ IV IV IV m 

ACAGAGCAAT 


A m A^A^TV^TV TV 

ATACAGACAA 


AGAGAACTGT 


T6ACATCTGC 


3120 


AGAGAAAGAC 


CTAAGATGCT 


^mA^o^miv IV IV IV 

GTGGCTAAAA 


^TV TV TV m^TV ^Tk m 

GAAATCAGAT 


GGCAAATCTA ACCGCCCAGG 


3180 


CATCCTAAAG 


AGCAATGATC 


CTGACAGTCT 


GAAGACTATC 


AAGTTATAGA 


CAAATTAAGA 


3240 


CTGGTAAAAA 


AAACCCTGTA 


mAAAAmn/^mA 

TAAAATAGTA 


A A A A ^n*^ AAA 

AAAACTGAAA 


AAAGAAAACT 


AGTCCTCTCA 


O O A A 

3300 


TGAGAAGACA 


GACCTGACAT 


fTl T\ y^m^ fV A TV TV 

CTACTGAAAA 


TV TV A TV ^mmm tv 

ATAGACTTTA 


CTGGAAAAAA 


TATGTGTATG 


3360 


AATACCTTCT 


AGTTTTTGTG 


A TV f^^*\\n\t Mll/^ IV 

AACGTTCTCA 


A /*TV tn^^ A m TV A 

AGATGGATAA 


AAGCTTTTCC 


TTGTAAAACG 


"3 A OA 

3420 


AGACTGATCA 


/* Ik m IV <^tn^ IV m^ 

GATAGTCATC 


A TV A TV ^ TV 

AAGAAGATTG 


mm TV TV TV^AATVA 

TTAAAGAAAA 


TTTTCCAAGG 


TTCGGAGTGC 


34o0 


CAaAaGCAaT 


AGTGTCAGAT 


AATGGTCCTG 


CCTTTGTTGC 


CCAGGTAAGT 


CAGGGT6TGG 


3540 


CCAAGTaTTT 


AGAGGTCAAA 


m^Tv TV TV Tvmnt^^ 

TGAAAATTCC 


ATTGTGTGTA 


CAGACCTCAG 


AGCTCAGGAA 


O ^ A A 

3600 


AGATAAAAAA 


GAATAAATAA 


AACTCTAAAC 


AGACCTTGAC 


AAAATTAATC 


CTAGAGACTG 


3660 


GCACAGACTT 


ACTTGGTACT 


CCTTCCCCTT 


GCCCTATTTA 


GAACTGAGAA 


TACTCCCTCT 


3720 


TGATTCGGTT 


TTACTCTTTT 


TAAGATCCTT 


TATGGGGCTC 


CTATGCCATC 


ACTGTCTTAA 


3780 


ATGATGTGTT 


TAAACCTATG 


TTGTTATAAT 


AATGATCTAT 


ATGTTAAGTT 


AAAAGGCTTG 

• 


3840 


CAGGTGGTGC 


AGAAAGAAGT 


CTGGTCACAA 


CTGGCTACAG 


TGAACAAGCT 


GGGTACCCCA 


3900 


AGGACATCTT 


ACCAGTTCCA 


GCCAGAGATC 


TGATCTACGA 


TCCCCGGGTC 


GACCCGGGTC 


3960 


GACCCTGT6G 


AATGTGTGTC 


AGTTAGGGTG 


TGGAAAGTCC 


CCAGGCTCCC 


CAGCAGGCAG 


4020 


AAGTATGCAA 


AGCATGCATC 


TCAATTAGTC 


AGCAACCAGG 


TGTGGAAAGT 


CCCCAGGCTC 


4080 
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CCCAGCAGGC AGAAGTATGC AAAGCATGCA 



CCTAACTCCG CCCATCCCGC CCCTAACTCC 

5 

CTGACTAATT TTTTTTATTT ATGCAGAGGC 



GAAGTAGTGA GGAGGCTTTT TTGGAGGCCT 
10 AAGCACTCAG GGCGCAAGGG CTGCTAAAGG 



AAACGGTGCT GACCCCGGAT GAATGTCAGC 



AGCGCAAAGA GAAAGCAGGT AGCTTGCAGT 

15 

GTTTTATGGA CAGCAAGCGA ACCGGAATTG 



AAGCCCTGCA AAGTAAACTG GATGGCTTTC 
20 TCAAGATCTG ATCAA6A6AC AGGATGAGGA 



CACGCAGGTT CTCCGGCC6C TTGGGTGGAG 



ACAATCGGCT GCTCTGATGC CGCCGTGTTC 

25 

TTTGTCAAGA CCGACCTGTC CGGTGCCCTG 
TCGTGGCTGG CCACGACGGG CGTTCCTTGC 

GGAAGGGACT GGCTGCTATT GGGCGAAGTG 

30 

GCTCCTGCCG AGAAAGTATC CATCATGGCT 



CCGGCTACCT GCCCATTCGA CCACCAAGCG 
35 ATGGAAGCCG GTCTTGTCGA TCAGGATGAT 



GCCGAACTGT TC6CCAGGCT CAAGGCGCGC 



CATGGCGATG CCTGCTTGCC GAATATCATG 

40 

GACTGTGGCC GGCTGGGTGT GGCGGACCGC 



ATTGCTGAAG AGCTTGGCGG CGAATGGGCT 
45 GCTCCCGATT CGCAGCGCAT CGCCTTCTAT 



CTCTGGGGTT CGAAATGACC GACCAAGCGA 



CCACCGCCGC CTTCTATGAA AGGTTGGGCT 

50 

ATCTGCTGCT TGCAAACAAA AAAACCACCG 



GAGCTACCAA CTCTTTTTCC GAAGGTAACT 
55 GTCCTTCTAG TGTAGCCGTA GTTAGGCCAC 



TACCTCGCTC TGCTAATCCT GTTACCAGTG 



ACCGGGTTGG ACTCAAGACG ATAGTTACCG 

60 

GGTTCGTGCA CACAGCCCAG CTTGGAGCGA 



CGTGAGCATT GAGAAAGCGC CACGCTTCCC 
65 AGCGGCAGGG TCGGAACAGG AGAGCGCACG 



TCTCAATTAG 


TCAGCAACCA 


TAGTCCCGCC 


4140 


GCCCAGTTCC 


GCCCATTCTC 


CGCCCCATGG 


4200 


CGAGGCCGCC 


TCGGCCTCTG 


AGCTATTCCA 


4260 


AGGCTTTTGC 


AAAAAGCTTC 


ACGCTGCCGC 


4320 


AAGCGGAACA 


CGTAGAAAGC 


CAGTCCGCAG 


4380 


TACTGGGCTA 


TCTGGACAAG 


GGAAAACGCA 


4440 


GGGCTTACAT 


GGCGATAGCT 


AGACTGGGCG 


4500 




L.UCL(.« 1 LToG 


TAAbbTTGGG 


43dU 






GCGCAGGGGA 


4 oZO 


i \Aj III \^\a\^n 


•P/^ ft T*P/^ ft ft ft 


ft ft T/^/^ ft T T/^ 

AGAIGGAI XG 


4 OoU 


nX9\a\^ Ink X VrU 


Inl KanK-t 1 u 


o a/^ ft ft ft ^ 




\,\3\3\^ l\jl LfiiV? 




C^UGo Jl 1 CI 1 


A Q Art 


GCAGCTGTGC 


TCGACGTTGT 


CACTGAAGCG 


H O OU 

4 920 


CCGGGGCAGG 


ATCTCCTGTC 


ATCTCACCTT 


4980 


GATGCAATGC 


GGCGGCTGCA 


TACGCTTGAT 


5040 


AAACATCGCA 


TCGAGCGAGC 


ACGTACTCGG 


5100 


CTGGACGAAG 


AGCATCAGGG 


GCTCGCGCCA 


5160 


ATGCCCGACG 


GCGAGGATCT 


CGTCGTGACC 


5220 


GTGGAAAATG 


GCCGCTTTTC 


TGGATTCATC 


5280 


TATCAGGACA 


TAGCGTTGGC 


TACCCGTGAT 


5340 


GACCGCTTCC 


TCGTGCTTTA 


CGGTATCGCC 


5400 


CGCCTTCTTG 


ACGA6TTCTT 


CTGAGCGGGA 


5460 


CGCCCAACCT 


GCCATCACGA 


GATTTCGATT 


5520 


TCGGAATCGT 


TTTCCGGGAC 


GGAATTCGTA 


5580 


CTACCAGCGG 


TGGTTTGTTT 


GCCGGATCAA 


5640 


GGCTTCAGCA 


GAGCGCAGAT 


ACCAAATACT 


5700 


CACTTCAAGA 


ACTCTGTAGC 


ACCGCCTACA 


5760 


GCTGCTGCCA GTGGCGATAA 


GTCGTGTCTT 


5820 


GATAAGGCGC 


AGCG6TCGGG 


CTGAAC6GGG 


5880 


ACGACCTACA 


CCGAACTGAG 


ATACCTACAG 


5940 


GAAGGGAGAA AGGCGGACAG 


GTATCCGGTA 


6000 


AGGGAGCTTC 


CAGGGGGAAA 


CGCCTGGTAT 


6060 
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^1*1* T R 'P 21 T 






T a p T* 1* a r» 


GTCGATTTTT 


GTGATGCTCG 


6120 








oA/iAnAV«uLrU 


ar*f a nf^/v^f^n 
A\aLAAui3\^L>o 


AlaATGCuCCG 


CCTCGAGAAC 


6180 








a ^^fp a a ^/*a T 
u A\« I AA^UA X 


i^VjIjVsbbAATT 


GCCuCTGGAA 


TAGGAACAGG 


6240 




\3A\^1A\^ 1 iaO 1 




PTf^ar'na aTT* 
UiW^bUAAl i 


Ci^AbUAu^ X L 


CAAGCCGCAG 


TACAGGATGA 


6300 


10 




<^ T'nr' a a a a a T 
vsTTvaAAAAAl 


r*a a T^Ti^T a a 
LAAi CTCTAA 


pr'Ta p a a a TV 
CCTAwiAAAG 


TCTCTCACTT 


CCCTGTCTGA 


6360 




r*acaaTr'i^aa 
UAuAAiCuAA 


bVsVsuCL* i AbA 


V^X XbX XAX X X 


uXAAAAGAAG 


GAGGGCTGTG 


6420 




TGCTGCTCTA 


AAAGAAGAAT 


GTTGCTTCTA 


TGCGGACCAC 


ACAGGACTAG 


TGAGAGACAG 


6480 






TTpai^a^ar'a 
i 1 uAvjAo Au A 




bAbAi-AtjAAA 


v^XGriTGAGl 


CAACTCAAGG 


6540 






uuAUlul 1 in, 


AuAla AX wuuu 


X X ow XXX AULr 


a /^^ntTP a T a 1* 

ACCTTGATAT 


CTACCATTAT 


6600 




GGGACCCCTC 
AGTCCAATTT 


ATTGTACTCC 
GTTAAAGACA 


TAATGATTTT 
GGATATCAGT 


GCTCTTCGGA 
GGTCCAGGCT 


CCCTGCATTC 
CTAGTTTTGA 


TTAATCGATT 
CTCAACAATA 


6660 
6720 




TCACCAGCTG 


AAGCCTATAG 


AGTACGAGCC 


ATAGATAAAA 


TAAAAGATTT 


TATTTAGTCT 


6780 


25 


CCAGAAAAAG GGGGG 

(2) INFORMATION FOR SEQ ID NO: 23: 








6795 


30 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9093 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 










(ii) MOLECULE TYPE: DNA (genomic) 








40 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
AATGAAAGAC CCCACCTGTA GGTTTGGCAA GCTAGCTTAA 


k 

GTAACGCCAT 


TTTGCAAGGC 


60 




ATGGAAAAAT 


ACATAACTGA 


GAATAGAGAA 


GTTCAGATCA 


AGGTCAGGAA 


CAGATGGAAC 


120 


ds 


AGCTGA^TAT 


GGGCCAAACA 


GGATATCTGT 


GGTAAGCAGT 


TCCTGCCCCG 


GCTCAGGGCC 


180 




AAGAACAGAT 


GGAACAGCTG 


AATATGGGCC 


AAACAGGATA 


TCTGTGGTAA 


GCAGTTCCTG 


240 


SO 


CCCCG6CTCA 
AGAGAACCAT 


GGGCCAAGAA 
CAGATGTTTC 


CAGATGGTCC 
CAGGGTGCCC 


CCAGATGCGG 
CAAGGACCTG 


TCCAGCCCTC 
AAATGACCCT 


AGCAGTTTCT 
GTGCCTTATT 


300 
360 




TGAACTAACC 


AATCAGTTCG 


CTTCTCGCTT 


CTGTTCGCGC 


GCTTCTGCTC 


CCCGAGCTCA 


420 




ATAAAAGAGC 


CCACAACCCC 


TCACTCGGGG 


CGCCAGTCCT 


CCGATTGACT 


GAGTCGCCCG 


480 




GGTACCCGTG 


TATCCAATAA ACCCTCTTGC 


AGTTGCATCC 


GACTTGTGGT 


CTCGCTGTTC 


540 


60 


CTTGGGAGGG 
CTCGTCCGGG 


TCTCCTCTGA 
ATCGGGAGAC 


GTGATTGACT 
CCCTGCCCAG 


ACCCGTCAGC GGGGGTCTTT 
GGACCACCGA CCCACCACCG 


CATTTGGGGG 
GGAGGTAAGC 


600 
660 




TGGCCAGCAA 


CTTATCTGTG 


TCTGTCCGAT 


TGTCTAGTGT 


CTATGACTGA 


TTTTATGCGC 


720 


65 


CTGCGTCGGT 


ACTAGTTAGC 


TAACTAGCTC 


TGTATCTGGC 


GGACCCGTGG 


TGGAACTGAC 


780 



wo 98y3a326 

GAGTTCGGAA CACCCGGCCG CAACCCTGGG 
GGGACGCCTG GTGGACCCCT TTGAAGGCCA 
5 TTCGAGTCCC ACCTCGTGCC CAGTTGCGAG 
GTTGCGAGAT CGTGGGTTCG AGTCCCACCT 
TCCCACCTCG TGTTTTGTTG CGAGATCGTG 

10 

GGATCGTGGG TTCGAGTCCC ACCTCGTGCA 
CCATCTGATT CTTCTGGTTT CTCTTTTTGT 
15 ACTGTTTTTC TAAAAATGGG ACAATCTGTG 
TCGCTTGGTA ATTTTGTTTG TTTACGTTTG 
TATCTTGTTT TTGTTTGTGG TTTACGGTTT 

20 

CAGACTTGGA CTGATGACTG ACGACTGTTT 
AATCCTGTCA GATCCCTATG CTGACCACTT 
25 CGATGGATCC CTCGACTAAC TAATAGCCCA 
CCCCCCCCTA ACGTTACTGG CCGAAGCCGC 
ATGTTATTTT CCACCATATT GCCGTCTTTT 

30 

GTCTTCTTGA CGAGCATTCC TAGGGGTCTT 
TTGAATGTCG TGAAGGAAGC AGTTCCTCTG 
35 GCGACCCTTT GCAGGCAGCG GAACCCCCCA 
CCACGTGTAT AAGATACACC TGCAAAGGCG 
ATAGTTGTGG AAAGAGTCAA ATGGCTCTCC 

40 

GCCCAGAAGG TACCCCATTG TATGGGATCT 
TGTGTTTAGT CGAGGTTAAA AAAACGTCTA 
45 CTTTGAAAAA CACGATAATA ATCATGG6CG 
GGGAAAACCC TGGCGTTACC CAACTTAATC 
GGCGTAATA6 CGAAGAGGCC CGCACCGATC 

50 

GCGAATGGCG CTTTGCCTGG TTTCCG6CAC 
6CGATCTTCC TGAGGCCGAT ACTGTCGTCG 
55 ATGCGCCCAT CTACACCAAC GTAACCTATC 
CGGAGAATCC GACGGGTTGT TACTCGCTCA 
AAGGCCAGAC GCGAATTATT TTTGATGGCG 

60 

GGCGCTGGGT CGGTTACGGC CAGGACAGTC 
TTTTACGCGC CGGAGAAAAC CGCCTCGCGG 
65 ATCTGGAAGA TCAGGATATG TGGCGGATGA 



PCTAJS98/03918 



AGACGTCCCA 


GGAGGAACAG 


GGGAGGATCA 


840 


AGAGACCATT 


TGGGGTTGCG 


AGATCGTGGG 


900 


ATCGTGGGTT 


CGAGTCCCAC 


CTCGTGTTTT 


960 


CGCGTCTGGT 


CACGGGATCG 


TGGGTTCGAG 


1020 


GGTTCGAGTC 


CCACCTCGCG 


TCTGGTCACG 


1080 


GAGGGTCTCA ATTGGCCGGC 


CTTAGAGAGG 


1140 


CTTAGTCTCG 


TGTCCGCTCT 


TGTTGTGACT 


1200 


TCCACTCCCC 


TTTCTCTGAC 


TCTGGTTCTG 


1260 


TTTTTGTGAG 


TCGTCTATGT 


TGTCTGTTAC 


1320 


CTGTGTGTGT 


CTTGTGTGTC 


TCTTTGTGTT 


1380 


TTAAGTTATG 


CCTTCTAAAA 


TAAGCCTAAA 


1440 


CCTTTCAGAT 


CAACAGCT6C 


CCTTACGTAT 


1500 


TTCTCCAAGG 


TCGAGCGGGA 


TCAATTCCGC 


1560 


TTGGAATAAG 


GCCGGTGTGC 


GTTTGTCTAT 


1620 


GGCAATGTGA 


GGGCCCGGAA 


ACCTGGCCCT 


1680 


TCCCCTCTCG 


CCAAAGGAAT 


GCAAGGTCTG 


1740 


GAAGCTTCTT 


GAAGACAAAC 


AACGTCTGTA 


1800 


CCTGGCGACA GGTGCCTCTG 


CGGCCAAAAG 


1860 


GCACAACCCC 


AGTGCCACGT 


TGTGAGTTGG 


1920 


TCAAGCGTAT 


TCAACAAGGG 


GCTGAAGGAT 


1980 


GATCTGGGGC 


CTCGGTGCAC 


ATGCTTTACA 


2040 


GGCCCCCCGA ACCACGGGGA 


CGTGGTTTTC 


2100 


CGGATCCCGT 


CGTTTTACAA 


CGTCGTGACT 


2160 


GCCTTGCAGC 


ACATCCCCCT 


TTCGCCAGCT 


2220 


GCCCTTCCCA ACAGTTGCGC 


AGCCTGAATG 


2280 


CAGAAGCGGT 


GCCGGAAAGC 


TGGCTGGAGT 


2340 


TCCCCTCAAA 


CTGGCAGATG 


CACGGTTACG 


2400 


CCATTACGGT 


CAATCCGCCG 


TTTGTTCCCA 


24 60 


CATTTAATGT 


TGATOAAAGC 


TGGCTACAGG 


2520 


TTAACTCGGC 


GTTTCATCTG 


TGGTGCAACG 


2580 


GTTTGCCGTC 


TGAATTTGAC 


CTGAGCGCAT 


2640 


TGATGGTGCT 


GCGTTGGAGT 


GACGGCAGTT 


2700 


GCGGCATTTT 


CCGTGACGTC 


TCGTTGCTGC 


2760 
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ATAAACCGAC 


TACACAAATC 


AGCGATTTCC 


ATGTTGCCAC 


TCGCTTTAAT 


GATGATTTCA 


2820 


GCCGCGCTGT 


ACTGGAGGCT 


GAAGTTCAGA 


TGTGCGGCGA 


GTTGCGTGAC 


TACCTACGGG 


2880 


TAACAGTTTC 


TTTATGGCAG 


GGTGAAACGC 


AGGTCGCCAG 


CGGCACCGCG 


CCTTTCGGCG 


2940 


GTGAAATTAT 


CGATGAGCGT 


GGTGGTTATG 


CCGATCGCGT 


CACACTACGT 


CTGAACGTCG 


3000 


AAAACCCGAA ACTGTGGAGC 


GCCGAAATCC 


CGAATCTCTA 


TCGTGCGGTG 


GTTGAACTGC 


3060 


ACACCGCCGA 


CGGCACGCTG 


ATTGAAGCAG 


AAGCCTGCGA 


TGTCGGTTTC 


CGCGAGGTGC 


3120 


GGATTGAAAA 


TGGTCTGCTG 


CTGCTGAACG 


GCAAGCCGTT 


GCTGATTCGA 


GGCGTTAACC 


3180 


GTCACGAGCA 


TCATCCTCTG 


CATGGTCAGG 


TCATGGATGA 


GCAGACGATG 


GTGCAGGATA 


3240 


TCCTGCTGAT 


GAAGCAGAAC 


AACTTTAACG 


CCGTGCGCTG 


TTCGCATTAT 


CCGAACCATC 


3300 


CGCTGTGGTA CAC6CTGTGC 


GACCGCTACG 


GCCTGTATGT 


GGTGGATGAA 


GCCAATATTG 


3360 


AAACCCACGG 


CATGGTGCCA 


ATGAATCGTC 


TGACCGATGA 


TCCGCGCTGG 


CTACCGGCGA 


3420 


TGAGCGAACG 
GGTCGCTGGG 


CGTAACGCGA 
GAATGAATCA 


ATGGTGCAGC 
GGCCACGGCG 


GCGATCGTAA 
CTAATCACGA 


TCACCCGAGT 
CGCGCTGTAT 


GTGATCATCT 
CGCTGGATCA 


3480 
3540 


AATCTGTCGA 


TCCTTCCCGC 


CCGGTGCAGT 


ATGAAGGCGG 


CGGAGCCGAC 


ACCACGGCCA 


3600 


CCGATATTAT 


TTGCCCGATG 


TACGCGCGCG 


TGGATGAAGA 


CCAGCCCTTC 


CCGGCTGTGC 


3660 


CGAAATGGTC 


CATCAAAAAA 


TGGCTTTCGC 


TACCTGGAGA 


GACGCGCCCG 


CTGATCCTTT 


3720 


GCGAATACGC 


CCACGCGATG 


GGTAACAGTC 


TTGGCGGTTT 


CGCTAAATAC 


TGGCAGGCGT 


3780 


TTCGTCAGTA 


TCCCCGTTTA 


CAGGGCGGCT 


TCGTCTGGGA 


CTGGGTGGAT 


CAGTCGCTGA 


3840 


TTAAATATGA 


TGAAAACGGC 


AACCCGTGGT 


CGGCTTACGG 


CGGTGATTTT 


GGCGATACGC 


3900 


CGAACGATCG 


CCA6TTCTGT 


ATGAACGGTC 


TGGTCTTTGC 


CGACCGCACG 


CCGCATCCAG 


3960 


CGCTGACGGA 


AGCAAAACAC 


CAGCAGCAGT 


TTTTCCAGTT 


CCGTTTATCC 


GGGCAAACCA 


4020 


TCGTiAGTGAC 


CAGCGAATAC 


CTGTTCCGTC 


ATAGCGATAA 


CGAGCTCCTG 


CACTGGATGG 


4080 


TGGCGCTGGA 


TGGTAAGCCG 


CTGGCAAGCG 


GTGAAGTGCC 


TCTGGATGTC 


GCTCCACAAG 


4140 


GTAAACAGTT 


GATTGAACTG 


CCTGAACTAC 


CGCAGCCGGA 


GAGCGCCGGG 


CAACTCTGGC 


4200 


TCACAGTACG 


CGTAGTGCAA 


CCGAACGC6A 


CCGCATGGTC 


AGAAGCCGGG 


CACATCAGCG 


4260 


CCTGGCAGCA 


GTG6CGTCTG 


GCGGAAAACC 


TCAGTGTGAC 


GCTCCCCGCC 


GCGTCCCACG 


4320 


CCATCCCGCA TCTGACCACC 


AGCGAAATGG 


ATTTTTGCAT 


CGAGCTG6GT 


AATAAGCGTT 


4380 


GGC/^TTTAA 


CCGCCA6TCA 


GGCTTTCTTT 


CACAGATGTG 


GATTGGCGAT 


AAAAAACAAC 


4440 


TGCTGACGCC 


GCTGCGCGAT 


CAGTTCACCC 


GTGCACCGCT 


GGATAACGAC 


ATTGGCGTAA 


4500 


GTGAAGCGAC 


CCGCATTGAC 


CCTAACGCCT 


GGGTCGAACG 


CTGGAAGGCG 


GCGGGCCATT 


4560 


ACCAGGCCGA AGCAGCGTTG 


TTGCAGTGCA 


CGGCAGATAC 


ACTTGCTGAT 


GCGGTGCTGA 


4620 


TTACGACCGC 


TCACGCGTGG 


CAGCATCAG6 


GGAAAACCTT 


ATTTATCAGC 


CGGAAAACCT 


4680 


ACC6GATTGA TGGTA6TGGT 


CAAATGGCGA 


TTACCGTTGA 


TGTTGAAGTG 


GCGAGCGATA 


4740 


CACCGCa^TCC 


GGCGCGGATT 


GGCCTGAACT 


GCCAGCTGGC 


GCAGGTAGCA 


GAGCGGGTAA 


4800 
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ACTGGCTCGG ATTAGGGCCG CAAGAAAACT 
ACCGCTG6GA TCTGCCATTG TCAGACATGT 

5 

GTCTGCGCTG CGGGACGCGC GAATTGAATT 
AGTTCAACAT CAGCCGCTAC AGTCAACAGC 
10 TGCACGCGGA AGAAGGCACA TGGCTGAATA 
ACGACTCCTG GAGCCCGTCA GTATCGGCGG 
ACCAGTTGGT CTGGTGTCAA AAATAATAAT 

15 

CAGCAGTGCA GTGGTGGACA GAAAGCAAGT 

TTCAGCCCAC AAAGCCAAAC TTGTGGCTTT 

20 AAAGTCTACA CGGACAGCAG GTATGCTCTT 

AGAACTGTTG ACATCTGCAG AGAAAGACCT 
CAAATCTAAC CGCCCAGGCA TCCTAAAGAG 

25 GTTATAGACA AATTAAGACT GGTAAAAAAA 

AGAAAACTAG TCCTCTCATG AGAAGACAGA 

GGAAAAAATA TGTGTATGAA TACCTTCTAG 

30 

GCTTTTCCTT GT7\AAACGAG ACTGATCAGA 
TTCCAAGGTT CGGAGTGCCA AAAGCAATAG 
35 AGGTAAGTCA GGGTGTGGCC AAGTATTTAG 
GACCTCAGAG CTCAGGAAAG ATAAAAAAGA 
AATTAATCCT AGAGACTGGC ACAGACTTAC 

40 

ACTGAGAATA CTCCCTCTTG ATTCGGTTTT 
ATGCCATCAC TGTCTTAAAT GATGTGTTTA 
45 GTTAAGTTAA AAGGCTTGCA GGTGGTGCAG 
AACAAGCTGG GTACCCCAAG GACATCTTAC 
CCCGGGTCGA CCCGGGTCGA CCCTGTGGAA 

50 

AGGCTCCCCA GCAGGCAGAA GTATGCAAAG 
TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG 
55 AGCAACCATA GTCCCGCCCC TAACTCCGCC 
CCATTCTCCG CCCCATGGCT GACTAATTTT 
GGCCTCTGAG CTATTCCA6A AGTAGTGAGG 

60 

AAAGCTTCAC GCTGCCGCAA GCACTCAGGG 
TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA 
65 TGGACAAGGG AAAACGCAAG CGCAAAGAGA 



ATCCCGACCG 


CCTTACTGCC 


GCCTGTTTTG 


4860 


ATACCCCGTA 


CGTCTTCCCG 


AGCGAAAACG 


4920 


ATGGCCCACA 


CCAGTGGCGC 


GGCGACTTCC 


4980 


AACTGATGGA AACCAGCCAT 


CGCCATCTGC 


5040 


TCGACGGTTT 


CCATATGGGG 


ATTGGTGGCG 


5100 


AATTCCAGCT 


GAGCGCCGGT 




DloU 


AACCGGGCAG 


GGG6GATCCG 


AAGGCGGGGA 


C O O A 


GATCTAGGCC 


AGCAGCCTCC 






AATACAAGCT 


CTGTAAATGG 






GCCACTGTAC 


AGAGCAATAT 




uu 


AAGATGCTGT 
CAATGATCCT 


GGCTAAAAGA 
GACAGTCTGA 


AGACTATCAA 


5520 


ACCCTGTATA 


Ai\ATAGTAAA 


AACTGAAAAA 


5580 


CCTGACATCT 


ACTGAAAAAT 


AGACTTTACT 


5640 


TTTTTGTGAA 


CGTTCTCAAG 


ATGGATAAAA 


5700 


TAGTCATCAA 


GAAGATTGTT 


AAAGAAAATT 


5760 


TGTCAGATAA 


TGGTCCTGCC 


TTTGTTGCCC 


5820 


AGGTCAAATG 


AAAATTCCAT 


TGTGTGTACA 


5880 


ATAAATAAAA 


CTCTAAACAG 


ACCTTGACAA 


5940 


TTGGTACTCC 


TTCCCCTTGC 


CCTATTTAGA 


6000 


ACTCTTTTTA 


AGATCCTTTA 


TGGGGCTCCT 


6060 


AACCTATGTT 


GTTATAATAA 


TGATCTATAT 


6120 


AAAGAAGTCT 


GGTCACAACT 


GGCTACAGTG 


6180 


CAGTTCCAGC 


CAGAGATCTG 


ATCTACGATC 


6240 


TGTGTGTCAG 


TTAGGGTGTG 


GAAAGTCCCC 


6300 


CATGCATCTC 


AATTAGTCAG 


CAACCAGGTG 


6360 


AAGTATGCAA 


AGCATGCATC 


TCAATTAGTC 


6420 


CATCCCGCCC 


CTAACTCCGC 


CCAGTTCCGC 


6480 


TTTTATTTAT 


GCAGAGGCCG 


AGGCCGCCTC 


6540 


AGGCTTTTTT 


GGAGGCCTAG 


GCTTTTGCAA 


6600 


CGCAAGGGCT 


GCTAAAGGAA 


GCGGAACACG 


6660 


CCCCGGATGA 


ATGTCAGCTA 


CTGGGCTATC 


6720 


AAGCAGGTAG 


CTTGCAGTGG 


GCTTACATGG 


6780 
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CGATAGCTAG ACTGGGCGGT TTTATGGACA 



CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA 
5 ATCTGATGGC GCAGGGGATC AAGATCTGAT 



ATTGAACAAG ATGGATT6CA C6CAGGTTCT 



TATGACTG6G CACAACAGAC AATCGGCTGC 

10 

CAGGGGCGCC CGGTTCTTTT TGTCAAGACC 



GACGAGGCAG CGCGGCTATC GTGGCTGGCC 
15 GACGTTGTCA CTGAAGCGGG AAGGGACTGG 



CTCCTGTCAT CTCACCTTGC TCCTGCCGAG 



CGGCTGCATA CGCTTGATCC GGCTACCTGC 

20 

GAGCGAGCAC GTACTC6GAT GGAAGCCGGT 
CATCA6G6GC TCGCGCCAGC CGAACTGTTC 

GAGGATCTCG TCGTGACCCA TGGCGATGCC 

25 

CGCTTTTCTG GATTCATCGA CTGTGGCCGG 



GCGTTGGCTA CCCGTGATAT TGCTGAAGAG 
30 GTGCTTTACG GTATCGCCGC TCCCGATTCG 



GAGTTCTTCT GAGCGGGACT CTGGGGTTCG 



CATCACGAGA TTTCGATTCC ACCGCCGCCT 

35 

TCCGGGACGG TU^TTCGTAAT CTGCTGCTTG 



GTTTGTTTGC CGGATCAAGA GCTACCAACT 
40 GCGCAGATAC CAAATACTGT CCTTCTAGTG 



TCTGTAGCAC CGCCTACATA CCTCGCTCTG 



GGCGATAA6T CGTGTCTTAC CGGGTTGGAC 

45 

CGGTCGGGCT GAACGGGGGG TTCGTGCACA 



GAACTGAGAT ACCTACAGCG TGAGCATTGA 
50 GCGGACAGGT ATCCGGTAAG CGGCAGGGTC 



GGGGGAAACG CCTGGTATCT TTATAGTCCT 



CGATTTTTGT GATGCTCGTC AGGGGGGCGG 

55 

ATGCGCCGCC TCGAGAACCC TGGCCCTATT 



CGCTGGAATA GGAACAGGGA CTACTGCTCT 
60 AGCCGCAGTA CAGGATGATC TCAGGGAGGT 



TCTCACTTCC CTGTCTGAAG TTGTCCTACA 



AAAAGAAGGA GG6CTGTGTG CTGCTCTAAA 

65 

AGGACTAGTG AGAGACAGCA TGGCCAAATT 
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GCAAGCGAAC 


CGGAATTGCC 


AGCTGGGGCG 


6840 


GTAAACTGGA 


TGGCTTTCTT 


GCCGCCAAGG 


6900 


CAAGAGACAG 


GATGAGGATC 


GTTTCGCATG 


6960 


UCbGUCCsCTT 


GGGTGGAGAG 


GCTATTCGGC 


7020 




CCGTGTTCCG 


GCTGTCAGCG 


n rt o 
7080 


GACCTGTCCG 


GTGCCCTGAA 


TGAACTGCAG 


7140 


ACGACGGGCG 


TTCCTTGCGC 


AGCTGTGCTC 


7200 


CTGCTATTGG 


GCGAAGTGCC 


GGGGCAGGAT 


7260 


AAAGTATCCA 


TCATGGCTGA 


TGCAATGCGG 


7320 


CCATTCGACC 


ACCAAGCGAA ACATCGCATC 


7380 


CTTGTCGATC 
GCCAGGCTCA 


AGGATGATCT 
AGGCGCGCAT 


GGACGAAGAG 
GCCCGACGGC 


7440 
7500 


TGCTTGCCGA 


ATATCATGGT 


GGAAAATGGC 


7560 


CTGGGTGTGG 


CGGACCGCTA 


TCAGGACATA 


7620 


CTTGGCGGCG 


AATGGGCTGA 


CCGCTTCCTC 


7680 


CAGCGCATCG 


CCTTCTATCG 


CCTTCTTGAC 


7740 


AAATGACCGA 


CCAAGCGACG 


CCCAACCTGC 


7800 


TCTATGAAAG 


GTTGGGCTTC 


GGAATCGTTT 


7860 


CAAACAAAAA 


AACCACCGCT 


ACCAGCGGTG 


7920 


CTTTTTCCGA 


AGGTAACTGG 


CTTCAGCAGA 


7980 


TAGCCGTAGT 


TAGGCCACCA 


CTTCAAGAAC 


8040 


CTAATCCTGT 


TACCAGTGGC 


TGCTGCCAGT 


8100 


TCAAGACGAT 


AGTTACCGGA 


TAAGGCGCAG 


8160 


CAGCCCAGCT 


TGGAGCGAAC 


GACCTACACC 


8220 


GAAAGCGCCA 


CGCTTCCCGA AGGGAGAAAG 


A A A A 

8280 


GGAACAGGAG 


AGCGCACGAG 


GGAGCTTCCA 


8340 


GTCGGGTTTC 


GCCACCTCTG 


ACTTGAGCGT 


8400 


AGCCTATGGA 


AAAACGCCAG 


CAACGCCGAG 


8460 


ATTGGGTGGA 


CTAACCATGG 


GGGGAATTGC 


8520 


AATGGCCACT 


CAGCAATTCC 


AGCAGCTCCA 


8580 


TGAAAAATCA 


ATCTCTAACC 


TAGAAAAGTC 


8640 


GAATCGAAGG 


GGCCTAGACT 


TGTTATTTCT 


8700 


AGAAGAATGT 


TGCTTCTATG 


CGGACCACAC 


8760 


GAGAGAGAGG 


CTTAATCAGA 


GACAGAAACT 


8820 
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GTTTGAGTCA ACTCAAGGAT GGTTTGAGGG ACTGTTTAAC AGATCCCCTT GGTTTACCAC 8880 

CTTGATATCT ACCATTATGG GACCCCTCAT TGTACTCCTA ATGATTTTGC TCTTCGGACC 8940 

5 

CTGCATTCTT AATCGATTAG TCCAATTTGT TAAAGACAGG ATATCAGTGG TCCAGGCTCT 9000 

AGTTTTGACT CAACAATATC ACCA6CTGAA GCCTATAGAG TACGAGCCAT AGATAAAATA 9060 

10 AAAGATTTTA TTTAGTCTCC AGAAAAAGGG GGG 9093 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

20 (ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

25 

GACTAACCTT GATTCCCTGG AGGCGGGGGT CTTTCATTTG GGGGCT 46 
(2) INFORMATION FOR SEQ ID NO: 25: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4834 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

35 

(ii) MOLECULE TYPE: DNA (genomic) 



40 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 





TGAAGAATAA 


AAAATTACTG 


GCCTCTTGTG 


AGAACATGAA 


CTTTCACCTC GGAGCCCACC 


60 


45 


CCCTCCCATC 


TGGAAAACAT 


ACTTGAGAAA AACATTTTCT 


GGAACAACCA CA6AATGTTT 


120 




CAACAGGCCA 


GATGTATTGC 


CAAACACAGG 


ATATGACTCT 


TTGGTTGAGT AAATTTGTGG 


180 


50 


TTGTTAAACT 


TCCCCTATTC 


CCTCCCCATT 


CCCCCTCCCA 


GTTTGTGGTT TTTTCCTTTA 


240 


AAAGCTTGTG 


AAAAATTTGA 


GTCGTCGTCG 


AGACTCCTCT 


ACCCTGTGCA AAGGTGTATG 


300 




AGTTTCGACC 


CCAGAGCTCT 


GTGTGCTTTC 


TGTTGCTGCT 


TTATTTCGAC CCCAGAGGTC 


360 


55 


TGGTCTGTGT 


GCTTTCATGT 


CGCTGCTTTA 


TTAAATCTTA 


CCTTCTACAT TTTATGTATG 


420 




GTCTCAGTGT 


CTTCTTGGGT 


ACGCGGCTGT 


CCCGGGACTT 


GAGTGTCTGA GTGAGGGTCT 


480 


60 


TCCCTCGAGG 


GTCTTTCATT 


TGGTACATGG 


GCCGGGAATT 


CGAGAATCTT TCATTTGGTG 


540 


CATTGGCCG6 


GAATTCGAAA ATCTTTCATT 


TGGTGCATTG 


GCCGGGAAAC AGGGCGACCA 


600 




CCCAGAGGTC 


CTAGACCCAC 


TTAGAGGTAA 


GATTCTTTGT 


TCTGTTTTGG TCTGATGTCT 


660 


65 


GTGTTCTGAT 


GTCTGTGTTC 


TGTTTCTAAG 


TCTGGTGCGA 


TCGCAGTTTC AGTTTTGCGG 


720 
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ACGCTCAGTG AGACCGCGCT CCGAGAGGGA 
GGTGTCCACC GTCCGTTC6C CCTGGGAGAC 
S CGCCTGGTGG ACCCCTTTGA AGGCCAAGAG 
AGTCCCACCT CGTGCCCAGT TGCGAGATCG 
C6AGATCGTG GGTTCGAGTC CCACCTCGCG 

10 

ACCTCGTGTT TTGTTGCGAG ATCGTGGGTT 

CGTGGGTTCG AGTCCCACCT CGTGCAGAGG 

15 CTGATTCTTC TGGTTTCTCT TTTTGTCTTA 

TTTTTCTAAA AATGGGACAA TCTGTGTCCA 

TTGGTAATTT TGTTTGTTTA CGTTTGTTTT 
20 TTGTTTTTGT TTGTGGTTTA CGGTTTCTGT 

CTTGGACTGA TGACTGACGA CTGTTTTTAA 

CTGTCAGATC CCTAT6CT6A CCACTTCCTT 

25 

CAACTCCAGA GAGCAGCCAG CGGGTCACAG 
AAAAATGAGC TCGGAAATCC GGAGCAAATG 
30 AATGTTGTGG CTGCTGAAGC AAAAGAAGAG 
CGCGGGTTCC CAGGCAGCTT CTCATTCCCC 
AAAAACTGCT TTCACTTTGA GATATGAGTG 

35 

TCCCTTCCCT GCCCCACGTG TTTTCTCTTC 
TGAGTCTGTT CTAAGCTCCA GTGAGGGAGG 
40 TAAGGAGCAC CTGTGAGTCT AACTGCCAGG 
AGAAAGTGTC CCAACAATCT GACCAAGGTA 
AGTCAGAATC AGAGCTGTGC TGTGAGACAA 

45 

AAAGTCAGGA AAACTAGAAA ACTTAGATAG 
AAGATCAACG TGTATACTGT AAA6AAAATG 
SO AGAAGAGGAG CCCCCCTCAT GACCAAACCC 
TAACAAAAGG GGTGCTAACA CAGAAGCTGA 
TGAAGCAGCT AAAAAAGAGA CTGTGTTTCA 

55 

TAAAT^GTTC CTGGGCACTG CGGGCTTTTG 
AAAGAGATAA ACAGCCCTTC GTATAGAAAA 
60 CTATTGAGAC TGCCCTAAT6 TTGTCCCCAG 
AAGGTATTGC CAAAGAAGTT CTTACTCAGA 
ACTTGTAAGA AATTAGACCT GGTGGCTGTA 

65 

TCTGGTCAAG GACGCAGATA AATTGACTCT 



GTGCGGGGTG 


GATAAGGATA 


GACGTGTCCA 


780 


GTCCCAGGAG 


GAACAGGGGA 


GGATCAGGGA 


840 


ACCATTTGGG 


GTTGCGAGAT 


CGTGGGTTCG 


900 


TGGGTTCGAG 


TCCCACCTCG 


TGTTTTGTTG 


960 


TCTGGTCACG 


GGATCGTGGG 


TTCGAGTCCC 


1020 


CGAGTCCCAC 


CTCGCGTCTG 


GTCACGGGAT 


1080 


GTCTCAATTG 


GCCGGCCTTA 


GAGAGGCCAT 


1140 


GTCTCGTGTC 


CGCTCTTGTT 


GTGACTACTG 


1200 


CTCCCCTTTC 


TCTGACTCTG 


GTTCTGTCGC 


1260 


TGTGAGTCGT 
GTGTGTCTTG 


CTATGTTGTC 
TGTGTCTCTT 


TGTTACTATC 
TGTGTTCAGA 


1320 
1380 


GTTATGCCTT 


CTAAAATAAG 


CCTAAAAATC 


1440 


TCAGATCAAC 


AGCTGCCCTG 


CCTCCCACTC 


1500 


TGGTCCCGCC 


CATGAACCTG 


GAGCCTAGGG 


1560 


AGGAGTGGTC 


CCTGAGAAGT 


CAGTGGCCTA 


1620 


GAGGCT6TTC 


GAGTAGCCGG 


CCAAGAGCGC 


1680 


TGTCCCTCCC 


ATCCCGTCTC 


TTGTTAACAG 


1740 


GCCCGATACA 


GCCAGCTGTG 


AGAGCTGTAC 


1800 


TCAGGCGACC 


CCTCCCTGAG 


CTGCTGGCAG 


1860 


CATCCGCCCA 


CTTGGGGCTT 


CTGTCCAAGG 


1920 


CTCTGATGGG 


GGTCTCGTCT 


CTGTGGGACT 


1980 


ACAGGAAGTT 


AAGACAAAGA 


CAGAGACCAA 


2040 


AAAGATAAAA AAAATAAAAT 


GCTGGCCACA 


2100 


TACCTGGCAA 


CAAAAGAAAG 


CTTTTGGCTA 


2160 


AGCACTGGGT 


GAGAGACTGC 


CCCAACAAAA 


2220 


TTCACCTGTT 


CGTGGCTAAA 


AGTAAAGAGA 


2280 


GTCCTTAAAA 


GAGTCCGGTG 


GCCTACCTGT 


2340 


TACTCCTCCA 


CTGACCAGTG 


CAAAACAAGC 


2400 


CAGATTGTGG 


ATTCCAGGTT 


TTGCTGAGTT 


2460 


ATAAAAAACA 


ACCTTGGATG 


TCCTTGGATG 


2520 


CTATGGGACT 


CCTAGATGTG 


ACTGAGAACA 


2580 


GATTGGGACC 


CTGAAAAAGA 


CCT6TGGCAT 


2640 


AGATGGCCTG 


CTTGTCTGCA 


CATAGTGGCT 


2700 


GAGACAAAAC 


TTGGCACATG 


TCCTAGAAAG 


2760 
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TGTGGTTCAG CCCCCATGAC CGATGGCTGA 



TCCCCTGACC GATGGACACA TTGTCAGAGC 

5 

CCCCTGCTAT CCTCGATCTC ACTACTGCCT 



ATTCTGGCAG AAGAAACTCA TACTCGAAAT 
10 AGAGTTTGAG CTGGTACACG GATGGCAGTA 



GGACAGCAGT GCAGTGGTGG ACAGAAAGCA 



GACTTCAGCC CACAAAGCCA AACTTGTGGC 

15 

AAAAAAGTCT ACACG6ACAG CAGGTATGCT 

AAGAGAACTG TTGACATCTG CAGAGAAAGA 
TGGCAAATCT AACCGCCCAG GCATCCTAAA 

20 

CAAGTTATAG ACAAATTAAG ACTGGTAAAA 



AAAAGAAAAC TAGTCCTCTC ATGAGAAGAC 
25 ACTGGAAAAA ATATGTGTAT GAATACCTTC 



AAAGCTTTTC CTTGTAAAAC GAGACTGATC 



ATTTTCCAAG GTTCGGAGTG CCAAAAGCAA 

30 

CCCAGGTAAG TCAGGGTGTG GCCAAGTATT 



ACAGACCTCA GAGCTCAGGA AAGATAAAAA 
35 CAAAATTAAT CCTAGAGACT GGCACAGACT 



A6AACTGAGA ATACTCCCTC TTGATTCGGT 



CCTATGCCAT CACTGTCTTA AATGATGTGT 

40 

TATGTTAAGT TAAAAGGCTT GCAGGTGGTG 



GTGAACAAGC TGGGTACCCC AAGGACATCT 
45 TACACCTGCG TCATGCTGAG ACCCTCAAGC 
TTACTAATCT GCCTTATTCT GTTTTTGTTC 



CTCCACATAG AGATATAGAC TTCTGAAATT 

50 

GGGGAATGAA 6AATAAAAAA TTACTGGCCT 



CCCACCCCCT CCCATCTGGA AAACATACTT 
55 ATGTTTCAAC AGGCCAGATG TATTGCCAAA 



TTGTGGTTGT TAAACTTCCC CTATTCCCTC 



CCTTTAAAAG CTTGTGAAAA ATTTGAGTCG 

60 

TGTATGAGTT TCGACCCCAG AGCTCTGTGT 



GAGCTCTGGT CTGTGTGCTT TCATGTCGCT 
65 TGTATGGTCT CAGTGTCTTC TTGGGTACGC 



CTAACGCTCT 


TGAAAACATT 


ATCCAACTGT 


2820 


TTTTTTTGAC 


TGAACGAGTG 


ACCTTCGCTC 


2880 


GAGACTTCAC 


CTACTCATCA 


TTGTGCTGAC 


2940 


l9 A i U 1 OAAbo 


ATCAGATCAG 




•3 A rt A 
JUDO 




»1»7VH ^^/^rp TV TV 

TAAGbvjTAAii 


CvsbAAbbLuC) 


O A C A 




bULAbCAuCC 


i Uuu i AAAvjva 




1 1 lAAi AL.AA 


/^r'm/^m^cn JV TV TV 

bUiCTbrAAA 


qir^/^mjv n a n a jv 
ibuIAAAAAA 


"^l QA 




TTVA^IV/^TV/^^TV IV 

TACAGAGCAA 


TATAwAlaAuA 




CCTAAGATGC 
GAGCAATGAT 


*MITl IV TV 

TGTGGCTAAA 
CCTGACAGTC 


3v/^n K anf/^ivoTv 
AGAAATCAGA 

TGAAGACTAT 


■a Oft A 

3360 


AAAACCCTGT 


ATAAAATAGT 


AAAAACTGAA 


3420 


AGACCTGACA 


TCTACTGAAA 


AATAGACTTT 


3480 


TAGTTTTTGT 


GAACGTTCTC 


AAGATGGATA 


3540 


AGATAGTCAT 


CAAGAAGATT 


GTTAAAGAAA 


3600 


TAGTGTCAGA 


TAATGGTCCT 


GCCTTTGTTG 


3660 


TAGAGGTCAA ATGAAAATTC 


CATTGTGTGT 


3720 


AGAATAAATA AAACTCTAAA 


CAGACCTTGA 


3780 


TACTTGGTAC 


TCCTTCCCCT 


TGCCCTATTT 


3840 


TTTACTCTTT 


TTAAGATCCT 


TTATGGGGCT 


3900 


TTAAACCTAT 


GTTGTTATAA 


TAATGATCTA 


3960 


CAGAAAGAAG 


TCTGGTCACA 


ACTGGCTACA 


4020 


TACCAGTTCC 


AGCCAGAGAT 


CTGATCTACG 


4080 


CTCACTAAAA 


GGGTCCCTGC 


CTAGTTCTGT 


4140 


CCATGTTAAA 


GATAGAGTAA 


ATGCAGTATT 


4200 


CTAAGATTAG 


AATTATTTAC 


AAGAAGAAGT 


4260 


CTTGTGAGAA 


CATGAACTTT 


CACCTCGGAG 


4.320 


GAGAAAAACA 


TTTTCTGGAA 


CAACCACAGA 


4380 


CACAGGATAT 


GACTCTTTGG 


TTGAGTAAAT 


4440 


CCCATTCCCC 


CTCCCAGTTT 


GTGGTTTTTT 


4500 


TCGTCGAGAC 


TCCTCTACCC 


TGTGCAAAGG 


4560 


GCTTTCTGTT 


GCTGCTTTAT 


TTCGACCCCA 


4620 


GCTTTATTAA ATCTTACCTT 


CTACATTTTA 


4680 


GGCTGTCCCG 


GGACTTGAGT 


GTCTGAGTGA 


4740 
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GGGTCTTCCC TCGAGGGTCT TTCATTT6GT ACATGGGCCG GGAATTCGAG AATCTTTCAT 4800 
TTGGTGCATT GGCCGGGAAT TCGAAAATCT TTCA 4834 
(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4518 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 



CACCTGACGC 


GCCCTGTAGC 


GGCGCATTAA 


GCGCGGCGGG 


TGTGGTGGTT 


ACGCGCAGCG 


60 


TGACCGCTAC 


ACTTGCCAGC 


GCCCTAGCGC 


CCGCTCCTTT 


CGCTTTCTTC 


CCTTCCTTTC 


120 


TCGCCACGTT 


CGCCGGCTTT 


CCCCGTCAAG 


CTCTAAATCG 


GGGGCTCCCT 


TTAGGGTTCC 


180 


GATTTAGTGC 


TTTACGGCAC 


CTCGACCCCA 


AAAAACTTGA 


TTAGGGTGAT 


GGTTCACGTA 


240 


GTGGGCCATC 


GCCCTGATAG 


ACGGTTTTTC 


GCCCTTTGAC 


GTTGGAGTCC 


ACGTTCTTTA 


300 


ATAGTGGACT 


CTTGTTCCAA 


ACTGGAACAA 


CACTCAACCC 


TATCTCGGTC 


TATTCTTTTG 


360 


ATTTATAAGG 


GATTTTGCCG 


ATTTCGGCCT 


ATTGGTTAAA AAATGAGCTG 


ATTTAACAAA 


420 


AATTTAACGC 


GAATTTTAAC 


AAAATATTAA 


CGCTTACAAT 


TTACGCGTTA AGATACATTG 


480 


ATGAGTTTGG 


ACAAACCACA 


ACTAGAATGC 


AGTGAAAAAA ATGCTTTATT 


TGTGAAATTT 


540 


GTGATGCTAT 


TGCTTTATTT 


GTAACCATTA 


TAAGCTGCAA 


TAAACAAGTT 


AACAACAACA 


600 


ATTGCATTCA 


TTTTATGTTT 


CAGGTTCAGG 


GGGAGGTGTG 


GGAGGTTTTT 


TAAAGCAAGT 


660 


AAAACCTCTA 


CAAATGTGGT 


ATGGCTGATT 


ATGATCATGA ACAGACTGTG 


AGGACTGAGG 


720 


GGCCTGAAAT 


GAGCCTTGGG 


ACTGTGAATC 


TAAAATACAC 


AAACAATTAG 


AATCAGTAGT 


780 


TTAACACATT 


ATACACTTAA 


AAATTGGATC 


TCCATTCGCC 


ATTCAGGCTG 


CGO^CTGTT 


840 


6GGAAGGGCG 


ATCGGTGCGG 


GCCTCTTCGC 


TATTACGCCA GCTGGCGAAA GGGGGATGTG 


900 


CTGCAAGGCG 


ATTAAGTTGG 


GTAACGCCAG 


GGTTTTCCCA 


GTCACGACGT 


TGTAAAACGA 


960 


CGGCCAGTGA 


ATTGTAATAC 


GACTCACTAT 


AGGGCGAATT 


GGGTACACTT 


ACCTGGTACC 


1020 


CCACCCGGGT 


GGAAAATCGA 


TGGGCCCGCG 


GCCGCTCTAG 


AAGTACTCTC 


GAGAAGCTTT 


1080 


TTGAATTCTT 


T6GATCCACT 


AGTGTCGACC 


TGCAGGCGCG 


CGAGCTCCAG 


CTTTTGTTCC 


1140 


CTTTAGTGAG 


GGTTAATTTC 


GAGCTTGGCG 


TAATCAAGGT 


CATAGCTGTT 


TCCTGTGTGA 


1200 


AATTGTTATC 


CGCTCACAAT 


TCCACACAAT 


ATACGAGCCG 


GAAGTATAAA 


GTGTAAAGCC 


1260 


TGGGGTGCCT 


AATGAGTGAG 


CTAACTCACA 


GTAATT6CGG 


CTAGCGGATC 


TGACGGTTCA 


1320 


CTAAACCAGC 


TCTGCTTATA 


TAGACCTCCC 


ACCGTACACG 


CCTACCGCCC 


ATTTGCGTCA 


1380 


ATGGGGCGGA 


GTTGTTACGA 


CATTTTGGAA AGTCCCGTTG 


ATTTTGGTGC 


CAAAACAAAC 


1440 
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TCCCATTGAC 


GTCAATGGGG 


TGGAGACTTG 


GAAATCCCCG 


TGAGTCAAAC 


CGCTATCCAC 


1500 


GCCCATTGAT 


GTACTGCCAA 


AACCGCATCA 


CCATGGTAAT 


AGCGATGACT 


AATACGTAGA 


1560 


TGTACTGCCA 


AGTAGGAAAG 


TCCCATAAGG 


TCATGTACTG 


GGCATAATGC 

^#\^%^44X flS** ^ \*%^ 


CAGGCGGGCC 


1620 


ATTTACCGTC 


ATTGACGTCA 


ATAGGGGGCG 


TACTTGGCAT 


ATGATACACT 


TGATGTACTG 


1680 


CCAAGTGGGC 


AGTTTACCGT 


AAATACTCCA 


CCCATTGACG 


TCAATGGAAA 

m ^^4 44 4 A X^^#4 A4 44 4 


GTCCCTATTG 


1740 


GCGTTACTAT 


GGGAACATAC 


GTCATTATTG 


ACGTCAATGG 


GCGGGGGTCG 


TTGGGCGGTC 


1800 

Jb W V W 


AGCCAGGCGG 


GCCATTTACC 


GTAAGTTATG 


TAACGCGGAA 


CTCCATATAT 

4 %^%^4 4 A 4 4 A 4 4 ^ 


GGGCTATGAA 


1860 

A W W V 


CTAATGACCC 
ACAGAATCAG 


CGTAATTGAT 
GGGATAACGC 


TACTATTAAT 
AGGAAAGAAC 


AACTAATGCA 
ATGTGAGCAA 


TGGCGGTAAT 
AAGGCCAGCA 


AC6GTTATCC 
AAAGGCCAGG 


1920 
1980 


AACCGTAAAA 


AGGCCGCGTT 


GCTGGCGTTT 


TTCCATAGGC 


TCCGCCCCCC 


TGACGAGCAT 


2040 


CACAAAAATC 


GACGCTCAAG 


TCAGAGGTGG 


CGAAACCCGA 


CAGGACTATA 


AAGATACCAG 


2100 


GCGTTTCCCC 


CTGGAAGCTC 


CCTCGTGCGC 


TCTCCTGTTC 


CGACCCTGCC 


GCTTACCGGA 


2160 


TACCTGTCCG 


CCTTTCTCCC 


TTCGGGAAGC 


GTGGCGCTTT 


CTCATAGCTC 


ACGCTGTAGG 


2220 


TATCTCAGTT 


CGGTGTA6GT 


CGTTCGCTCC 


AAGCTGGGCT 


GTGTGCACGA 


ACCCCCCGTT 


2280 


CA6CCCGACC 


GCTGCGCCTT 


ATCCGGTAAC 


TATCGTCTTG 


AGTCCAACCC 


GGTAAGACAC 


2340 


GACTTATCGC 


CACTGGCAGC 


AGCCACTGGT 


AACAGGATTA 


GCAGAGCGAG 


GTATGTAGGC 


2400 


GGTGCTACAG 


AGTTCTTGAA 


GTGGTGGCCT 


AACTACGGCT 


ACACTAGAAG 


GACAGTATTT 


2460 


GGTATCTGCG 


CTCTGCTGAA 


GCCAGTTACC 


TTCGGAAAAA 


GAGTTGGTAG 


CTCTTGATCC 


2520 


GGCAAACAAA 


CCACCGCTGG 


TAGCGGTGGT 


TTTTTTGTTT 


GCAAGCAGCA 


GATTACGCGC 


2580 


AGAAAAAAAG 


GATCTCAAGA 


AGATCCTTTG 


ATCTTTTCTA 


CGGGGTCTGA 


CGCTCAGTGG 


2640 


AACGAAAACT 


CACGTTAAGG 


GATTTTGGTC 


ATGAGATTAT 


CAAAAAGGAT 


CTTCACCTAG 


2700 


ATCCTTTTAA 


ATTAAAAATG 


AAGTTTTAAA 


TCAATCTAAA 


GTATATATGA 


GTAACCTGAG 


2760 


GCTATGGCAG 


GGCCTGCCGC 


CCCGACGTTG 


GCTGCGAGCC 


CTGGGCCTTC 


ACCCGAACTT 


2820 


GG6GGGTGGG 


GTGGG6AAAA 


GGAAGAAACG 


CGGGCGTATT 


GGCCCCAATG 


GGGTCTCGGT 


2880 


GGGGTATCGA 


CAGAGTGCCA 


GCCCTGGGAC 


CGAACCCCGC 


GTTTATGAAC 


AAACGACCCA 


2940 


ACACCGTGCG 


TTTTATTCTG 


TCTTTTTATT 


GCCGTCATAG 


CGCGGGTTCC 


TTCCGGTATT 


3000 


GTCTCCTTCC 


GTGTTTCAGT 


TAGCCTCCCC 


CTAGGGTGGG 


CGAAGAACTC 


CAGCATGAGA 


3060 


TCCCCGCGCT 


GGAGGATCAT 


CCAGCCGGCG 


TCCCGGAAAA 


CGATTCCGAA 


GCCCAACCTT 


3120 


TCATAGAAGG 


CGGCGGTGGA 


ATCGAAATCT 


CGTGATGGCA 


GGTTGG6CGT 


CGCTTGGTCG 


3180 


GTCATTTCGA 


ACCCCAGAGT 


CCCGCTCAGA 


AGAACTCGTC 


AAGAAGGCGA 


TAGAAGGCGA 


3240 


TGCGCTGCGA 


ATCGGGAGCG 


GCGATACCGT 


AAAGCACGAG 


GAAGCGGTCA 


GCCCATTCGC 


3300 


CGCCAAGCTC 


TTCAGCAATA 


TCACGGGTAG 


CCAACGCTAT 


GTCCTGATAG 


CGGTCCGCCA 


3360 


CACCCAGCCG 


GCCACAGTCG 


ATGAATCCAG 


AAAAGCGGCC 


ATTTTCCACC 


ATGATATTCG 


3420 



wo 98/38326 



106 



PCTAJS98/03918 



10 



15 



20 



25 



30 



35 









a T p r* T r* p r* r* 


Wi CCsGGCAl b 


C 1 LCjCO 1 i CaA 


1 A Qf\ 










1 TUQTCCAGA 


i UAx ruAT 




CGACAAGACC 


GGCTTCCATC 


CGAGTACGTG 


CTCGCTCGAT 


GCGATGTTTC 


GCTTGGTGGT 


3600 


CGAATGGGCA 


GGTAGCCGGA 


TCAAGCGTAT 


GCA6CCGCCG 


CATTGCATCA 


GCCATGATGG 


3660 


ATACTTTCTC 


GGCAGGAGCA 


AGGTGAGATG 


ACAGGAGATC 


CTGCCCCGGC 


ACTTCGCCCA 


3720 


ATAGCAGCCA 


GTCCCTTCCC 


GCTTCAGTGA 


CAACGTCGAG 


CACAGCTGCG 


CAAGGAACGC 


3780 


CCGTCGTGGC 


CAGCCACGAT 


AGCCGCGCTG 


CCTCGTCTTG 


CAGTTCATTC 


AGGGCACCGG 


3840 


ACAGGTCGGT 
CATCAGAGCA 


CTTGACAAAA 
GCCGATTGTC 


AGAACCGGGC 
TGTTGTGCCC 


GCCCCTGCGC 
AGTCATAGCC 


TGACAGCCGG 
GAATAGCCTC 


AACACGGCGG 
TCCACCCAACa 


3900 

jybu 


CGGCCGGAGA 


ACCTGCGTGC 


AATCCATCTT 


GTTCAATCAT 


GCGAAACGAT 


CCTCATCCTG 


4UZ0 


TCTCTTGATC 


GATCTTTGCA 


AAAGCCTAGG 


CCTCCAAAAA AGCCTCCTCA 


CTACTTCTGG 


4080 


AATAGCTC7VG 


AGGCCGAGGC 


GGCCTC6GCC 


TCTGCATAAA 


TAAAAAAAAT 


TAGTCAGCCA 


4140 


T6G6GCGGAG 


AATGGGCGGA 


ACTGGGCGGA 


GTTAG66GCG 


GGATGGGCGG 


Ik ^^nm Ik ^^^^^^^ 

AGTTAGGGGC 


4200 


GGGACTATGG 


TTGCTGACTA 


ATTGAGATGC 


ATGCTTTGCA 


TACTTCTGCC 


TGCTGGGGAG 


4260 


CCTGGGGACT 


TTCCACACCT 


GGTTGCTGAC 


TAATTGAGAT 


GCATGCTTTG 


CATACTTCTG 


4320 


CCTGCTGGGG 


AGCCTGGGGA 


CTTTCCACAC 


CCTAACTGAC 


ACACATTCCA 


CAGCTGGTTC 


4380 


TTTCCGCCTC 


AGGACTCTTC 


CTTTTTCAAT 


ATTATTGAAG 


CATTTATCAG 


GGTTATTGTC 


4440 


TCATGAGCGG 


ATACATATTT 


GAATGTATTT 


AGAAAAATAA ACAAATAGGG 


GTTCCGCGCA 


4500 


CATTTCCCCG 


AAAAGTGC 










4518 



40 



45 



(2) INFORMATION FOR SEQ ID MO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



50 



55 



60 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
CTCCACATAG AGATATAGAC TTCTG 
(2) INFORMATION FOR SEQ ID NO:28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



25 



65 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 
CGATCTTATT AATTAACTGG AGTTTTGAGC CCRMCCCCTC CCATC 45 
(2) INFORMATION FOR SEQ ID NO:29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5594 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 



TGCATTAGTT 


ATTAATAGTA ATCAATTACG 


GGGTCATTAG 


TTCATAGCCC 


ATATATGGAG 


60 


TTCCGCGTTA 


CATAACTTAC 


GGTAAATGGC 


CCGCCTGGCT 


GACCGCCCAA 


CGACCCCCGC 


120 


CCATTGACGT 


CAATAATGAC 


GTATGTTCCC 


ATAGTAACGC 


CAATAGGGAC 


TTTCCATT6A 


180 


CGTCAATGGG 


TGGAGTATTT 


ACGGTAAACT 


GCCCACTTGG 


CAGTACATCA 


AGTGTATCAT 


240 


ATGCCAAGTA 


CGCCCCCTAT 


TGACGTCAAT 


GACGGTAAAT 


GGCCCGCCTG 


GCATTATGCC 


300 


CAGTACATGA 


CCTTATGGGA 


CTTTCCTACT 


TGGCAGTACA 


TCTACGTATT 


AGTCATCGCT 


360 


ATTACCATGG 


TGATGCGGTT 


TTGGCAGTAC 


ATCAATGGGC 


GTGGATAGCG 


GTTTGACTCA 


420 


CGGGGATTTC 


CAAGTCTCCA CCCCATTGAC 


GTCAATGGGA 


GTTTGTTTTG 


GCACCAAAAT 


480 


CAACGGGACT 


TTCCAAAATG 


TCGTAACAAC 


TCCGCCCCAT 


TGACGCAAAT 


GGGCGGTAGG 


540 


CGT6TACGGT 


GGGAGGTCTA 


TATAAGCAGA 


GCTGGTTTAG 


TGAACCGTCA 


GATCCGCGCC 


600 


AGTCCTCCGA 


TTGACTGAGT 


CGCCCGGGTA 


CCCGTGTATC 


CAATAAACCC 


TCTTGCAGTT 


660 


GCATCCGACT 


TGTGGTCTCG 


CTGTTCCTTG 


GGAGGGTCTC 


CTCTGAGTGA 


TTGACTACCC 


720 


GTCAGCGGGG 


GTCTTTCATT 


TGGGGGCTCG 


TCCGGGATCG 


GGAGACCCCT 


GCCCAGGGAC 


780 


CACCGACCCA 


CCACCGGGAG 


GTAAGCTGGC 


CAGCAACTTA 


TCTGTGTCTG 


TCCGATTGTC 


840 


TAGTGTCTAT 


GACTGATTTT 


ATGCGCCTGC 


GTCGGTACTA 


GTTAGCTAAC 


TAGCTCTGTA 


900 


TCTGGCGGAC 


CCGTGGTGGA ACTGACGAGT 


TCGGAACACC 


CGGCCGCAAC 


CCTGGGAGAC 


960 


GTCCCAGGAG 


GAACAGGGGA 


GGATCAGGGA 


CGCCTGGTGG 


ACCCCTTTGA 


AGGCCAAGAG 


1020 


ACCATTTGGG 


GTTGCGAGAT 


CGTGGGTTCG 


AGTCCCACCT 


CGTGCCCAGT 


TGCGAGATCG 


1080 


TGGGTTCGAG 


TCCCACCTCG 


TGTTTTGTTG 


CGAGATCGTG 


GGTTCGAGTC 


CCACCTCGCG 


1140 


TCTGGTCACG 


GGATCGTGGG 


TTCGAGTCCC 


ACCTCGTGTT 


TTGTTGCGAG 


ATCGTGGGTT 


1200 


CGAGTCCCAC 


CTCGCGTCT6 


GTCACGGGAT 


CGTGGGTTCG 


AGTCCCACCT 


CGTGCAGAGG 


1260 


GTCTCAATTG 


GCCGGCCTTA GAGAGGCCAT 


CTGATTCTTC 


TGGTTTCTCT 


TTTTGTCTTA 


1320 


GTCTCGTGTC 

* 


CGCTCTTGTT 


GTGACTACTG 


TTTTTCTAAA AATGGGACAA 


TCTGTGTCCA 


1380 


CTCCCCTTTC 


TCTGACTCTG 


GTTCTGTCGC 


TTGGTAATTT 


TGTTTGTTTA 


CGTTTGTTTT 


1440 
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5 



15 



25 



TGTGAGTCGT CTATGTTGTC TGTTACTATC TTGTTTTTGT TTGTGGTTTA CGGTTTCTGT 1500 

GTGTGTCTTG TGTGTCTCTT TGTGTTCTIGA CTT6GACTGA TGACTGACGA CTGTTTTTAA 1560 

GTTATGCCTT CTAAAATAAG CCTAAAAATC CTGTCAGATC CCTATGCTGA CCACTTCCTT 1620 

TCAGATCAAC AGCTGCCCTT ACGTATCGAT GGATCCCTCG ACTAACTAAT AGCCCATTCT 1680 

10 CCAAGGTCGA GCGGGATCAA TTCCGCCCCC CCCCTAACGT TACTGGCCGA AGCCGCTTGG 1740 

AATAAGGCCG GTGTGCGTTT GTCTATATGT TATTTTCCAC CATATTGCCG TCTTTTGGCA 1800 

ATGTGAGGGC CCGGAAACCT GGCCCTGTCT TCTTGACGAG CATTCCTAGG GGTCTTTCCC 1860 

CTCTCGCCAA AGGAATGCAA GGTCTGTTGA ATGTCGTGAA GGAAGCAGTT CCTCTGGAAG 1920 

CTTCTTGAAG ACAAACAACG TCTGTAGCGA CCCTTTGCAG GCAGCGGAAC CCCCCACCTG 1980 

20 GCGACAGGTG CCTCTGCGGC CAAAAGCCAC GTGTATAAGA TACACCTGCA AAGGCGGCAC 2040 

AACCCCAGTG CCACGTTGTG AGTTGGATAG TTGTGGAAAG AGTCAAATGG CTCTCCTCAA 2100 

GCGTATTCAA CAAGGGGCTG AAGGATGCCC AGAAGGTACC CCATTGTATG GGATCTGATC 2160 

TGGGGCCTCG GTGCACATGC TTTACATGTG TTTAGTCGAG GTTAAAAAAA CGTCTAGGCC 2220 

CCCCGAACCA CGGGGACGTG GTTTTCCTTT GAAAAACACG ATAATAATCA TGGCTACAGG 2280 

30 CTCCCGGACG TCCCTGCTCC TGGCTTTTGG CCTGCTCTGC CTGCCCTGGC TTCAAGAGGG 2340 

CAGTGCCTTC CCAACCATTC CCTTATCCAG GCTTTTTGAC AACGCTATGC TCCGCGCCCA 2400 

TCGTCTGCAC CA6CTGGCCT TTGACACCTA CCA6GAGTTT GAAGAA6CCT ATATCCCAAA 2460 

GGAACAGAAG TATTCATTCC TGCAGAACCC CCAGACCTCC CTCTGTTTCT CAGAGTCTAT 2520 

TCCGACACCC TCCT^CAGGG AGGAAACACA ACAGAAATCC AACCTAGAGC TGCTCCGCAT 2580 

40 CTCCCTGCTG CTCATCCAGT CGTGGCTGGA GCCCGTGCAG TTCCTCAGGA GTGTCTTCGC 2640 

CAACAGCCTG GTGTACGGCG CCTCTGACAG CAACGTCTAT GACCTCCTAA AGGACCTAGA 2700 

GGAAGGCATC CAAACGCTGA TGGGGAGGCT GGAAGATGGC AGCCCCCGGA CTGGGCAGAT 2760 

CTTCAAGCAG ACCTACAGCA AGTTCGACAC AAACTCACAC AACGATGACG CACTACTCAA 2820 

GAACTACGGG CTGCTCTACT GCTTCAGGAA GGACATGGAC AAGGTCGAGA CATTCCTGCG 2880 

50 CATCGTGCAG TGCCGCTCTG TGGAGGGCAG CTGTGGCTTC TAGCTGCCCG GGTGGCATCC 2940 

TGTGACCCCT CCCCAGTGCC TCTCCTGGCC CTGGAAGTTG CCACTCCAGT GCCCACCAGC 3000 

CTTGTCCTAA TGTGTGTCAG TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA 3060 

GTATGCAAAG CATGCATCTC AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC 3120 

CAGCAGGCAG AAGTATGCAA AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC 3180 

60 TAACTCCGCC CATCCCGCCC CTAACTCCGC CCA6TTCGGC CCATTCTCCG CCCCATGGCT 3240 

GACTAATTTT TTTTATTTAT 6CAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA 3300 

AGTAGTGAGG AGGCTTTTTT GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA 3360 

GCACTCAGGG CGCAAGGGCT GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA 3420 



35 



45 



55 



65 
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ACGGTGCTGA CCCCGGATGA ATGTCAGCTA 
CGCAAAGAGA AAGCAGGTAG CTTGCAGTGG 

5 

TTTATGGACA GCAAGC6AAC CGGAATTGCC 

GCCCTGCAAA GTAAACTGGA TGGCTTTCTT 

10 AAGATCTGAT CAAGAGACAG GATGAGGATC 

CGCAGGTTCT CCGGCCGCTT GGGTGGAGAG 
AATCGGCTGC TCTGATGCC6 CCGTGTTCCG 

15 TGTCAAGACC GACCTGTCCG GTGCCCTGAA 

GTGGCTGGCC ACGACGGGCG TTCCTTGCGC 

AAGGGACTGG CTGCTATTGG GCGAAGTGCC 

20 

TCCTGCCGAG AAAGTATCCA TCATGGCTGA 
GGCTACCTGC CCATTCGACC ACCAAGCGAA 
25 GGAAGCCGGT CTTGTCGATC AGGATGATCT 
CGT^ACTGTTC GCCAGGCTCA AGGCGCGCAT 
TGGCGATGCC TGCTTGCCGA ATATCATGGT 

30 

CTGTGGCCGG CTGGGTGTGG CGGACCGCTA 
TGCTGAAGAG CTTGGCGGCG AATGGGCTGA 
35 TCCCGATTCG CAGC6CATCG CCTTCTATCG 
CTGGGGTTCG AAATGACCGA CCAAGCGACG 
GACCCCACCT GTAGGTTTGG CAAGCTAGCT 

40 

AATACATAAC TGAGAATAGA GAAGTTCAGA 
TATGGGCCAA ACAGGATATC TGTGGTAAGC 
45 GATGGAACAG CTGAATATGG GCCAAACAGG 
TCAGGGCCAA GAACAGATGG TCCCCAGATG 
CATCAGATGT TTCCAGGGT6 CCCCAAGGAC 

50 

ACCAATCAGT TCGCTTCTCG CTTCTGTTCG 
AGCCCACAAC CCCTCACTCG GGGCGCCAGT 
55 GCTACCAGCG GTGGTTTGTT TGCCGGATCA 
TGGCTTCAGC AGAGCGCAGA TACCAAATAC 
CCACTTCAAG AACTCTGTAG CACCGCCTAC 

60 

GGCTGCTGCC AGTGGCGATA AGTCGTGTCT 
GGATAAGGCG CAGC6GTCG6 GCTGAACGGG 
65 AACGACCTAC ACCGAACTGA GATACCTACA 



CTGGGCTATC 


TGGACAAGGG 


AAAACGCAAG 


3480 


GCTTACATGG 


CGATAGCTAG 


ACTGGGCG6T 


3540 


AGCTGGGGCG 


CCCTCTGGTA AGGTTGGGAA 


3600 


GCCGCCAAGG 


ATCTGATGGC 


GCAGGGGATC 


3660 


GTTTCGCATG 


ATTGAACAAG 


ATGGATTGCA 


3720 


GCTATTCGGC 
GCTGTCAGCG 


TATGACTGGG 
CAGGGGCGCC 


CACAACAGAC 
CGGTTCTTTT 


3780 
3840 


TGAACTGCAG 


GACGAGGCAG 


CGCGGCTATC 


3900 


AGCTGTGCTC 


GACGTTGTCA 


CTGAAGCGGG 


3960 


GGGGCAGGAT 


CTCCTGTCAT 


CTCACCTT6C 


4020 


TGCAATGCGG 


CGGCTGCATA 


CGCTTGATCC 


4080 


ACATCGCATC 


GAGCGAGCAC 


GTACTCGGAT 


4140 


GGACGAAGAG 


CATCAGGGGC 


TCGCGCCAGC 


4200 


GCCCGACGGC 


GAGGATCTCG 


TCGTGACCCA 


4260 


GGAAAATGGC 


CGCTTTTCTG 


GATTCATCGA 


4320 


TCAGGACATA 


GCGTTGGCTA 


CCCGTGATAT 


4380 


CCGCTTCCTC 


GTGCTTTACG 


GTATCGCCGC 


4440 


CCTTCTTGAC 


GAGTTCTTCT 


GAGCGGGACT 


4500 


CCCAACCTCC 


AGAAAAAGGG 


GGGAATGAAA 


4560 


TAAGTAACGC 


CATTTTGCAA 


GGCATGGAAA 


4620 


TCAAGGTCAG 


GAACAGATGG 


AACAGCTGAA 


4680 


AGTTCCTGCC 


CCGGCTCAGG 


GCCAAGAACA 


4740 


ATATCTGTGG 


TAAGCAGTTC 


CTGCCCCGGC 


4800 


CGGTCCAGCC 


CTCAGCAGTT 


TCTAGAGAAC 


4860 


CTGAAATGAC 


CCTGTGCCTT 


ATTTGAACTA 


4920 


CGCGCTTCTG 


CTCCCCGAGC 


TCAATAAAAG 


4980 


AATCTGCTGC 


TTGCAAACAA 


AAAAACCACC 


5040 


AGAGCTACCA 


ACTCTTTTTC 


CGAAGGTAAC 


5100 


TGTCCTTCTA 


GTGTAGCCGT 


AGTTAGGCCA 


5160 


ATACCTCGCT 


CTGCTAATCC 


TGTTACCAGT 


5220 


TACCGGGTTG 


GACTCAAGAC 


GATAGTTACC 


5280 


GGGTTCGTGC 


ACACAGCCCA 


GCTTGGAGCG 


5340 


GCGTGAGCAT 


TGAGAAAGCG 


CCACGCTTCC 


5400 
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CGAAGGGAGA AAGGCGGACA GGTATCCGGT AAGCGGCAGG GTCGGAACAG GAGAGCGCAC 5460 

GAGGGAGCTT CCAGGGGGAA ACGCCTGGTA TCTTTATAGT CCTGTCGGGT TTCGCCACCT 5520 

CTGACTTGAG CGTCGATTTT TGTGATGCTC GTCAGGGGGG CGGAGCCTAT GGAAAAACGC 5580 

CAGCAACGCC GAGA 5594 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6561 base pairs 

(B) TYPE: nucleic acid 

(C) STRAMDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

GATCCCCGGG TCGACCCGGG TCGACCCTGT GGAATGTGTG TCAGTTAGGG TGTGGAAAGT 60 

CCCCAGGCTC CCCAGCAGGC AGAAGTATGC AAAGCATGCA TCTCAATTAG TCAGCAACCA 120 

GGTGTGGAAA GTCCCCAGGC TCCCCAGCAG GCAGAAGTAT GCAAAGCAT6 CATCTCAATT 180 

AGTCAGCAAC CATAGTCCCG CCCCTAACTC CGCCCATCCC GCCCCTAACT CCGCCCAGTT 240 

CCGCCCATTC TCCGCCCCAT GGCTGACTAA TTTTTTTTAT TTATGCAGAG GCC6AGGCCG 300 

CCTCGGCCTC TGAGCTATTC CAGAAGTAGT GAGGAGGCTT TTTTGGAGGC CTAGGCTTTT 360 

GCAAAAAGCT . TCACGCTGCC GCAAGCACTC AGGGCGCAAG GGCTGCTAAA GGAAGCGGAA 420 

CACGTAGAAA GCCAGTCCGC AGAAACGGTG CTGACCCCGG ATGAATGTCA GCTACTGGGC 480 

TATCTGGACA AGGGAAAACG CAAGCGCAAA GAGAAAGCAG GTAGCTTGCA GTGGGCTTAC 540 

ATGGCGATAG CTAGACTGGG CGGTTTTATG GACAGCAAGC GAACCGGAAT TGCCAGCTGG 600 

G6CGCCCTCT GGTAAGGTTG GGAAGCCCTG CAAAGTAAAC TGGATGGCTT TCTTGCCGCC 660 

AAGGATCTGA TGGCGCAGGG GATCAAGATC TGATCAAGAG ACAGGATGAG GATCGTTTCG 720 

CATGATTGAA CAAGATGGAT TGCACGCAGG TTCTCCGGCC GCTTGGGTGG AGAGGCTATT 780 

CGGCTATGAC TGGGCACAAC AGACAATCGG CTGCTCTGAT GCCGCCGTGT TCCGGCTGTC 840 

AGCGCAGGGG CGCCCGGTTC TTTTTGTCAA GACCGACCTG TCCGGTGCCC TGAATGAACT 900 

GCAGGACGAG GCAGCGCGGC TATCGTGGCT GGCCACGACG GGCGTTCCTT GCGCAGCTGT 960 

GCTCGACGTT GTCACTGAAG CGGGAA6GGA CTGGCTGCTA TTGGGCGAAG TGCCGGGGCA 1020 

GGATCTCCTG TCATCTCACC TTGCTCCTGC CGAGAAAGTA TCCATCATGG CTGATGCAAT 1080 

GCGGCGGCTG CATACGCTT6 ATCCGGCTAC CTGCCCATTC GACCACCAAG CGAAACATCG 1140 

CATCGAGCGA GCACGTACTC GGATGGAAGC CGGTCTTGTC GATCAGGATG ATCTGGACGA 1200 

AGAGCATCAG GGGCTCGCGC CAGCCGAACT GTTCGCCAGG CTCAAGGCGC GCATGCCCGA 1260 

CGGCGAGGAT CTCGTCGTGA CCCATGGCGA TGCCTGCTTG CCGAATATCA TGGTGGAAAA 1320 



wo 98y38326 

TGGCCGCTTT TCTGGATTCA TCGACTGTGG 
CATAGCGTTG GCTACCCGTG ATATTGCTGA 

5 

CCTCGTGCTT TACGGTATCG CCGCTCCCGA 

TGACGAGTTC TTCTGAGCGG GACTCTGGGG 

10 CTGCCATCAC GAGATTTCGA TTCCACCGCC 
GTTTTCCGGG ACGCCGGCTG GATGATCCTC 

GCCCACCCCG GAATTCGTAA TCTGCTGCTT 

15 GGTTTGTTTG CCGGATCAAG AGCTACCAAC 

AGCGCAGATA CCAAATACTG TCCTTCTAGT 

CTCTGTAGCA CCGCCTACAT ACCTCGCTCT 

20 

TGGCGATAAG TCGTGTCTTA CCGGGTTGGA 
GCGGTCGGGC TGAACG6GGG GTTCGTGCAC 
25 CGAACTGAGA TACCTACAGC GTGAGCATTG 
GGCGGACAGG TATCCGGTAA GCGGCAGGGT 
AGGGGGAAAC 6CCTGGTATC TTTATAGTCC 

30 

TCGATTTTTG TGATGCTCGT CAGGGGGGCG 
GATGCGCCGC CTCGAGTACA CCTGCGTCAT 
35 CCCTGCCTAG TTCTGTTTAC TAATCTGCCT 
GAGTAAATGC AGTATTCTCC ACATAGAGAT 
ATTTACAAGA AGAAGTGGGG AATGAAGAAT 

40 

AACTTTCACC TCGGAGCCCA CCCCCTCCCA 
CTGGAACAAC CACAGAATGT TTCAACAGGC 
45 CTTTGGTTGA GTAAATTTGT GGTTGTTAAA 
CAGTTTGTGG TTTTTTCCTT TAAAAGCTTG 
CTACCCTGTG CAAAGGTGTA TGAGTTTCGA 

50 

CTTTATTTCG ACCCCAGAGC TCTGGTCTGT 
TACCTTCTAC ATTTTATGTA TGGTCTCAGT 
55 TTGAGTGTCT GAGTGAGGGT CTTCCCTCGA 
TTCGAGAATC TTTCATTTGG TGCATTGGCC 
TGGCC6GGAA ACAGCGCGAC CACCCAGAGG 

60 

GTTCTGTTTT GGTCTGATGT CTGTGTTCTG 
GATCGCAGTT TCAGTTTTGC GGACGCTCAG 
65 TGGATAAGGA TAGACGTGTC CAGGTGTCCA 



PCTAJS98/0391S 







VjU 1 A 1 V^/iVauA 


xoou 






CVfZUCCeZC'TT 
\^ x\Ml\^\*\»^m' X X 


±4 4 V/ 




t\x V«V9L>C 1 X V« 1 










V3AV^V3VrfV.>v>AnV« 




CAGCGCGGGG 


ATCTCATGCT 


GGAGTTCTTC 


X o^u 
1680 


GCAAACAAAA 


AAACCACCGC 


TACCAGCGGT 


1740 


TCTTTTTCCG 


AAGGTAACTG 


GCTTCAGCAG 


1800 


GTA6CCGTAG 


TTAGGCC7VCC 


ACTTCAAGAA 


1860 


GCTAATCCTG 


TTACCAGTGG 


CTGCTGCCAG 


1920 


CTCAAGACGA 


TAGTTACCGG 


ATAAGGCGCA 


1980 


ACAGCCCAGC 


TTGGAGCGAA 


CGACCTACAC 


2040 


AGAAAGCGCC 


ACGCTTCCCG 


AAGGGAGAAA 


2100 


CGGAACAGGA 


GAGCGCACGA 


GGGAGCTTCC 


2160 


TGTCGGGTTT 


CGCCACCTCT 


GACTTGAGCG 


2220 


GAGCCTATGG 


AAAAACGCCA 


GCAACGCCGA 


2280 


GCTGAGACCC 


TCAAGCCTCA 


CTAAAAGGGT 


2340 


TATTCTGTTT 


TTGTTCCCAT 


GTTAAAGATA 


2400 


ATAGACTTCT 


GAAATTCTAA 


GATTAGAATT 


2460 


AAAAAATTAC 


TGGCCTCTTG 


TGAGAACATG 


2520 


TCTGGAAAAC 


ATACTTGAGA 


AAAACATTTT 


2580 


CAGATGTATT 


GCCAAACACA 


GGATATGACT 


2640 


CTTCCCCTAT 


TCCCTCCCCA 


TTCCCCCTCC 


2700 


TGAAAAATTT 


GAGTCGTCGT 


CGAGACTCCT 


2760 


CCCCAGAGCT 


CTGTGTGCTT 


TCTGTTGCTG 


2820 


GTGCTTTCAT 


GTCGCTGCTT 


TATTAAATCT 


2880 


GTCTTCTTGG 


GTACGCGGCT 


GTCCCGGGAC 


2940 


GGGTCTTTCA 


TTTGGTACAT 


GGGCCG6GAA 


3000 


GGGAATTCGA 


AAATCTTTCA 


TTTGGTGCAT 


3060 


TCCTAGACCC 


ACTTAGAGGT 


AAGATTCTTT 


3120 


ATGTCTGTGT 


TCTGTTTCTA AGTCTGGTGC 


3180 


TGAGACCGCG 


CTCCGAGAGG 


GAGTGCGGGG 


3240 


CCGTCCGTTC 


GCCCTGGGAG 


ACGTCCCAGG 


3300 
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AGGAACAGGG GAGGATCA6G GACGCCTGGT 

GGGTTGCGAG ATCGTGGGTT CGAGTCCCAC 

5 AGTCCCACCT CGTGTTTTGT TGCGAGATCG 

CGGGATCGTG GGTTCGAGTC CCACCTCGTG 

ACCTCGCGTC TGGTCACGGG ATCGTGGGTT 
10 TGGCCGGCCT TAGAGAGGCC ATCTGATTCT 

TCCGCTCTTG TTGTGACTAC TGTTTTTCTA 

TCTCTGACTC TGGTTCTGTC GCTTGGTAAT 

IS 

GTCTATGTTG TCTGTTACTA TCTTGTTTTT 
TGTGTGTCTC TTTGTGTTCA GACTTGGACT 
20 TTCTAAAATA A6CCTAAAAA TCCTGTCAGA 
ACAGCTGCCC TGCCTCCCAC TCCAACTCCA 
CCCATGAACC TGGAGCCTAG GGAAAAATGA 

25 

TCCCTGAGAA GTCAGTGGCC TAAATGTTGT 
TCGAGTAGCC GGCCAAGAGC GCCGCGGGTT 
30 CCATCCCGTC TCTTGTTAAC AGAAAAACTG 
CAGCCAGCTG TGAGA6CTGT ACTCCCTTCC 
CCCCTCCCTG AGCTGCTGGC AGTGAGTCTG 

35 

CACTTGGGGC TTCTGTCCAA GGTAAGGAGC 
GGGGTCTCGT CTCTGTGGGA CTAGAAAGTG 
40 TTAAGACAAA GACAGAGACC AAAGTCAGAA 
AAAAAATAAA ATGCTGGCCA CAAAAGTCAG 
AACAAAAGAA A6CTTTT6GC TAAAGATCAA 

45 

GTGAGAGACT GCCCCAACAA AAAGAAGAGG 
TTCGTGGCTA AAAGTAAAGA GATAACAAAA 
50 AAGAGTCCGG TGGCCTACCT GTTGAAGCAG 
CACTGACCAG TGCAAAACAA GCTAAAAAGT 
GGATTCCAGG TTTTGCTGAG TTAAAGAGAT 

55 

CAACCTTGGA TGTCCTTGGA TGCTATTGAG 
CTCCTAGATG TGACTGA6AA CAAAGGTATT 
60 CCCT6AAAAA GACCTGTGGC ATACTTGTAA 
TGCTTGTCTG CACATAGTGG CTTCTGGTCA 
ACTTGGCACA TGTCCTAGAA AGTGTGGTTC 

65 

CTTGAAAACA TTATCCAACT GTTCCCCTGA 



GGACCCCTTT 


GAAGGCCAAG 


AGACCATTTG 


3360 


CTCGTGCCCA 


GTTGCGAGAT 


CGTGGGTTCG 


3420 


TGGGTTCGAG 


TCCCACCTCG 


CGTCTGGTCA 


3480 


TTTTGTTGCG 


AGATCGTGGG 


TTCGAGTCCC 


3540 


CGAGTCCCAC 

Ol3 1 1 1 v.. 1 


CTCGTGCAGA 
ui 1 1 1 XGIUl 


GGGTCTCAAT 

lAGiL.iL,GiG 


3600 


TV a R aTfIC/t2ir' 


AAiUlGiGlL. 


/^7V r*'pr»rTT**n» 
wAU 1 uL.Uv< i 1 




1 I l\Jl I LKal I 


ti* j\ ^/^fTtfnfn/^ ip fp 

lACGi 1 IGi 1 


i. i J. G 1 GAG 1 G 


Q A 




1 Av^GG 1 1 Tu 1 


G 1 G 1 G 1 Gj. G 1 




uAI \3ivL, L uAL. 


GACTGTTTTT 


TV T\ /^mm TV mcf^f^ 

AAGTTATGCC 


«5 AAA 

3900 


tp/^r^/^m TV fj\f*r*rn 

TUCCTATGCT 


GACCACTTCC 


fp TV ^ K m^Tv 

TTTCAGATCA 


3960 


GAGAGCAGCC 


AGCGGGTCAC 


AGTGGTCCCG 


4020 


uCTCGGAAAT 


^ /^/^ ^ T\ TV TV TV 

CCGGAGCAAA 


TV TV ^fn^^ 

TGAGGAGTGG 


4080 


bGCTGCTGAA 


7\ TV TV TV TV TV^ 

GCAAAAGAAG 


AGGAGGCTGT 


4140 


/T^f* TV f>r*r^Yk r*f^ 

CCCAGGCAGC 


TTCTCATTCC 


CCTGTCCCTC 


4200 


CTTTCACTTT 


GAGATATGAG 


TGGCCCGATA 


4260 


CTGCCCCACG 


TGTTTTCTCT 


TCTCAGGCGA 


4 o A n 

4320 


TTCTAAGCTC 


S VtVI^^K ^^^^^^ 

CAGTGAGGGA 


GGCATCCGCC 


4380 


ACCXGTGAGT 


CTAACTGCCA 


GGCTCTGATG 


4440 


1*/^/^/^ T* TV TV TV tTi 

TCCCAACAAT 


TV y^/^ It TV f^f^ 

CTGACCAAGG 


fPTV TV y^TV ^^TV TV 

TAACAGGAAG 


4500 


ruAGAGCTGT 


GCTGTGAGAC 


JV7VTVJV'JV^TV*n7VTV 

AAAAAGATAA 


>i c f rv 
4560 


GAAAAC r AG A 


AAACT TAG AT 


AGTACCTGGC 


4 DZO 


cgtgtatact 


TV TV TV TV TV TV 

GTAAAGAAAA 


TV ^^Tv /^m^^ 

TGAGCACTGG 


4680 


AGCCCCCCTC 


TV ^^TV TV H M 

ATGACCAAAC 


CCTTCACCTG 


A^ A f\. 

4740 


GGGGTGCTAA 


CACAGAAGCT 


GAGTCCTTAA 


4800 


CTAAA7\AAGA 


GACTGTGTTT 


CATACTCCTC 


4860 


TCCTGGGCAC 


TGCGGGCTTT 


TGCAGATTGT 


4920 


AAACAGCCCT 


TCGTATAGAA 


AAATAAAAAA 


4980 


ACTGCCCTAA 


TGTTGTCCCC 


AGCTATGGGA 


5040 


GCCAAAGAAG 


TTCTTACTCA 


GAGATTGGGA 


5100 


GAAATTAGAC 


CTGGTGGCTG 


TAAGATGGCC 


5160 


AGGACGCAGA 


TAAATTGACT 


CTGAGACAAA 


5220 


AGCCCCCATG 


ACCGATGGCT 


GACTAACGCT 


5280 


CCGATGGACA 


CATTGTCAGA 


GCTTTTTTTG 


5340 
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ACTGAACGAG TGACCTTCGC TCCCCCTGCT ATCCTCGATC TCACTACTGC CTGAGACTTC 5400 

ACCTACTCAT CATTGTGCTG ACATTCTGGC AGAAGAAACT CATACTC6AA ATGATCTGAA 5460 

GGATCAGATC AGCCTTGGCC TGAGAGTTTG AGCTGGTACA CGGATGGCAG TAGCCTGGAG 5520 

GTTAAGGGTA AGCGGAAGGC GGGGACAGCA GTGCAGTGGT GGACAGAAAG CAAGTGATCT 5580 

AGGCCAGCAG CCTCCCTAAA GGGACTTCAG CCCACAAAGC CAAACTTGTG GCTTTAATAC 5640 

AAGCTCTGTA AATGGTAAAA AAAAAAAAGT CTACACGGAC AGCAGGTATG CTCTTGCCAC 5700 

TGTACAGAGC AATATACAGA CAAAGAGAAC TGTTGACATC TGCAGAGAAA GACCTAAGAT 5760 

GCTGT6GCTA AAAGAAATCA GATGGCAAAT CTAACCGCCC AGGCATCCTA AAGAGCAATG 5820 

ATCCTGACAG TCTGAAGACT ATCAAGTTAT AGACAAATTA AGACTGGTAA AAAAAACCCT 5880 

GTATAAAATA GTAAAAACTG AAAAAAGAAA ACTAGTCCTC TCATGAGAAG ACAGACCTGA 5940 

CATCTACTGA AAAATAGACT TTACTGGAAA AAATATGTGT ATGAATACCT TCTAGTTTTT 6000 

GTGAACGTTC TCAAGATGGA TAAAAGCTTT TCCTTGTAAA ACGAGACTGA TCAGATAGTC 6060 

ATCAAGAAGA TTGTTAAAGA AAATTTTCCA AGGTTCGGAG TGCCAAAAGC AATAGTGTCA 6120 

GATAATGGTC CTGCCTTTGT TGCCCAGGTA AGTCAGGGTG TGGCCAAGTA TTTAGAGGTC 6180 

AAATGAAAAT TCCATTGTGT GTACAGACCT CAGAGCTCAG GAAAGATAAA AAAGAATAAA 6240 

TAAAACTCTA AACAGACCTT GACAAAATTA ATCCTAGAGA CTGGCACAGA CTTACTTGGT 6300 

ACTCCTTCCC CTTGCCCTAT TTAGAACTGA GAATACTCCC TCTTGATTCG GTTTTACTCT 6360 

TTTTAAGATC CTTTATGGGG CTCCTATGCC ATCACTGTCT TAAATGATGT GTTTAAACCT . 6420 

ATGTTGTTAT AATAATGATC TATATGTTAA GTTAAAAGGC TTGCAGGTGG TGCAGAAAGA 6480 

AGTCTGGTCA CAACTGGCTA CAGTGAACAA GCTGGGTACC CCAAGGACAT CTTACCAGTT 6540 

CCA6CCAGAG ATCTGATCTA C 6561 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 55 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
6ACTAACCTT GATTCCACTG GAGCC6TATT ACCGCCATGC ATTAGTTATT AATAG 55 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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20 



40 



45 



55 



60 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
GACTAACCTT GATTCCACTG GAGTAATTGC GGCTAGCGGA TCTGACG 47 

10 (2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 base pairs 

(B) TYPE: nucleic acid 
15 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
25 GACTAACCTT GATTCCACTG GAGACACTTG ACCTCTACCG CGCCAGTCCT CCGATTGACT 60 
GAGTCG 66 
(2) INFORMATION FOR SEQ ID NO: 34: 

30 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
35 (D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 
GACTAACCTT GATTCCACTG GAGGGATCCG CGCCCATGAT TATTATCG 48 
(2) INFORMATION FOR SEQ ID NO: 35: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 55 base pairs 
50 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
GACTAACCTT GATTCCAGCA ATGTCATGGC TACAGGCTCC CGGACGTCCC TGCTC 55 
(2) INFORMATION FOR SEQ ID NO:36: 



65 (i)- SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
GACTAACCTT GATTCCAGCA ATGTTAGGAC AAGGCTGGTG GGCACTGG 48 
(2) INFORMATION FOR SEQ ID NO:37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
GACTAACCTT GATTCCACTG GAGGGTCGAC CCTGTGGAAT GTGTGTCAG 49 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 
GACTAACCTT GATTCCACTG GAGAATCTCG TGATGGCAGG TTGGGCGT 48 
(2) INFORMATION FOR SEQ ID NO:39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
GACTAACCTT GATTCCACTG AAGAGATTTT ATTTAGTCTC CAGAAAAAGG GGGG 54 
(2) INFORMATION FOR SEQ ID NO: 40: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

5 

(ii) MOLECULE TYPE: DNA (genoniic) 



10 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
GACTAACCTT GATTCCACTG AAGCCCCCAA ATGAAAGACC CCCGCTGACG 50 
15 (2) INFORMATION FOR SEQ ID N0:41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



25 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41; 

30 GACTAACCTT GATTCCACTG GAGCCGGGAC GGAATTCGTA ATCTGCTGC 49 

(2) INFORMATION FOR SEQ ID NO:42: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: DNA (genomic) 



45 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

GACTAACCTT GATTCCACTG GAGTTCTCGA GGCGGCGCAT CTCGGCG 47 
(2) INFORMATION FOR SEQ ID NO: 43: 

50 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
55 (D) TOPOLOGY: linear 



60 



65 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
CGCTCTAGAA CTAGTGGATC 20 
(2) INFORMATION FOR SEQ ID NO: 44: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single • 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44 

15 GTAATACGAC TCACTATAGG G 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

25 (ii) MOLECULE TYPE: DNA (genomic) 



21 



35 



30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: 

CGATCCACTG GAGCTCGGAG CCCACCCCCT CCCATCTAGA GGT 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
40 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



43 



45 



50 



55 



60 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
CGTCCTCCTG GAGAGCACAG GGTAGAGGAG TCTCGACGGT CAG 
(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



43 



65 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
CGCAACCCTG GAGACCTCTA GATGGGAGGG GGTGGGCTCC GAG 



43 
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(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 



IS (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

GCAGGACCTG GAGCTGACCG TCGAGACTCC TCTACCCTGT GCT 43 
(2) INFORMATION FOR SEQ ID NO: 49: 

20 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



30 



35 



45 



50 



60 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
CGCTCTAGAA CTAGTGGATC 20 
(2) INFORMATION FOR SEQ ID NO: 50: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
40 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
GTAATACGAC TCACTATAGG G 21 
(2) INFORMATION FOR SEQ ID NO: 51: 



55 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



65 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
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10 



30 



TACGTATCGA TGGATCCGA 19 
(2) INEX)RMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



15 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
GGATCCATCG ATACGTAAG 19 
20 (2) INFORMATION FOR SEQ ID N0:53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: 

35 GGCCGCTAAC TAATAGCCCA TTCTCCAAGG TACGTAGC 38 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

45 (ii) MOLECULE TYPE: DNA (genomic) 



50 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 

TACGTACCTT GGAGAATGGG CTATTAGTTA GCGGCCGC 38 
(2) INFORMATION FOR SEQ ID NO:55: 

55 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 55 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
60 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



65 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
6ACTAACCTT GATTCCACTG GAGTTTTCTC TATTCTTCAT TCCCCACTTC TTCTT 55 

^ (2) INFORMATION FOR SEQ ID NO:56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 60 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
10 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



15 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
GACTAACCTT GATTCCACTG GA6AATCTGG ACCAATTCTA TATAAGCCTG TGAAAAATTT 60 

20 

(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 4 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: DNA (genomic) 



35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 

GACTAACCTT GATTCCACTG GAGAAGAAGA AGTGGGGAAT GAAGAA 46 
(2) INFORMATION FOR SEQ ID NO: 58: 

40 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
45 (D) TOPOLOGY: linear 



50 



55 



65 



(ii) MOLECULE TYPE:. DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
GACTAACCTT GATTCCACTG GAGATCTCTA GATGGGAGGG GGTCTGGGCT C 51 
(2) INFORMATION FOR SEQ ID NO: 59: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 47 base pairs 
60 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
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20 



30 



50 



(Xl) SEQOENCE DESCRIPTION: SEQ ID NO: 59: 
GACTAACCTT GATTCCACTG GAGCTCGGAG CCCACCCCCT CCCATCT 47 
(2) INFORMATION FOR SEQ ID NO: 60: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 47 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
GACTAACCTT GATTCCACTG GAGGGAGGCC CTTATCTCAA AAATGTT 47 
(2) INFORMATION FOR SEQ ID NO: 61: 



25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



35 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
GACTAACCTT GATTCCACTG GAGTCTAAGA ACATTTTTGA GATAAGGGCC T 51 
40 (2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 4 base pairs 

(B) TYPE: nucleic acid 
45 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 

55 GACTAACCTT GATTCCACTG GAGTCACAGG CTTATATAGT GAAA 44 

(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 
60 (A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

65 (ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 
GACTAACCTT GATTCCCTGG AGACTGCACT GCTGTCCCCG CCTTCG 46 
(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 53 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 
GAGTT^ACCTT GATTCCCTGG AGATTTCTCA GACCCG6GTC GACCCTGTGG AAT 53 
(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 
GACTAACCTT GATTCCCTGG AGCTCGAGGC GGCGCATCTC GGCG 44 
{2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOCOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 
GACTAACCTT GATTCCCTGA AGACCTGCGT CATGCTGAGA CCCTCAA 47 
(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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15 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
6ACTAACCTT GATTCCCTGA AGCGGCCAAT GCACCAAATG AAAGATTTTC 50 
(2) INFORMATION FOR SEQ ID NO: 68: 



10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 
CGCATCTTTT AATTAACTGG AGARAATTTT TYACAGGCTT ATATAGKAAA 50 
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We claim: 



LA method for assembling a gene or gene vector comprising the steps of: 
5 a) designing at least 6 primers to produce at least three fragments in at least 

three separate polymerase chain reactions wherein each primer comprises at least one 
predetermined restriction endonuclease recognition site that recognizes a restriction 
endonuclease that cleaves at a distance firom the recognition site, a sequence complementary 
to a template sequence for amplification, and bases positioned at the restriction endonuclease 
1 0 cleavage site that are selected to be complementary to only one other overhanging created 
from em^atic cleavage of the fragments; 

b) combining the primers with template nucleic acid and performing a gene 
amplification reaction to produce multiple copies of an amplified template fragment 
incorporating the restriction endonuclease recognition site; 
IS c) digesting the amplified template fragments with one or more restriction 

endonucleases that recognize the restriction endonuclease recognition site of the 
primers to create overhanging termini wherein each overhanging temiini is 
complementary to only one other overhanging termini on another firagment; and 
d) combinmg the amplified and digested template fi-agments in a ligation 
20 reaction to produce a directionally ordered gene, nucleic acid fragment or gene vector. 



2. The method of claim 1 wherein the restriction endonuclease is at least one class IIS 
restriction endonuclease. 

25 

3. The method of claim 2 wherein the class IIS restriction endonuclease is selected fix)m the 
group consisting ofiAlwl^lwlSl Bbsl, Bbvl, BbvU, Bpml BsmAl, Bsml, BsmBU BspMl, 
Bsrl BsrDl EcoSn Earl, Fold, Gsul, Hgal Hphl, hfboU, Mnll, Plel Sapl Sfdm, 
Taqll,r/A111IL 

30 

4. The method of claim 1 wherein class II restriction endonuclease recognition sites, 
linkers, or adapters are not used to create the gene or gene vector. 
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5. The method of claim I wherein the product of the ligation reaction is introduced into 
prokaryotic or eukaryotic cells. 

5 6. The method of claim 1 wherein at least one target nucleic acid sequence is chosen 
from the group consisting of : transcriptional regulatory sequences; genetic vectors; introns 
and/or exons; viral encapsidation sequences; integration signals intended for introducing 
nucleic acid molecules into other nucleic acid molecules; retrotransposon(s); VL30 elements; 
or multiple allelic forms of a sequence. 

10 

7. The method of claim 1 wherein the method is used to genemte combinatorial libraries 
of a target sequence. 

8. The method of claim 7 wherem the target sequence is part or all of a gene. 

15 

9. The method of claim 8 wherein the gene encodes a protein. 

10. The method of claim 8 wherein the primers amplify allelic variants of part or all of a 
gene. 

20 

1 1 . The method of claim 1 wherein the product of the ligation reaction is passed between 
eukaryotic cells using a virus particle, by cell fusion, or by transfection. 

12. The method of claim 1 wherein the product of the ligation reaction is not introduced 
25 into prokaryotic cells. 

13. The method of claim 1 further combining at least one screening or selection step to 
select the products of the ligation reaction. 

30 14. The method of claim 1 wherein the product of the ligation reaction is mutated during 
passage in cells in order to generate genetic diversity. 
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15. The method of claim 14 wherein the product of the ligation reaction is mutated by- 
homologous recombination during passage in cells. 

16. The method of claim 1, wherein the method is used to isolate and identify regulatory 
5 sequences from a cell. 

17. The method of claim 1 1, wherein cells containing the product of the ligation reaction 
are selected for enhanced biological activity. 

10 18. The method of claim 1 7, wherein the cells containing the product of the ligation 
reaction are selected for dssue-specific, hormone-specific or developmental-specific gene 
expression. 

19. The method of claim 1 wherein the product of the ligation reaction is a circularized 
IS gene vector. 

20. A nucleic acid primer having a 5' and a 3' end to amplify a nucleic acid fragment for the 
ligation of at least two fragments comprising: 

a restriction endonuclease recognition site that recognizes a restriction endonuclease, 
20 wherein the restriction endonuclease cleaves at a distance from the recognition site and 
creates overhanging termini; 

a sequence complementary to a template sequence to be amplified to produce the 
nucleic acid fragment; 

at least two nucleic acid bases positioned at the restriction endonuclease cleavage site 
25 and that form an overhanging terminus after cleavage by the restriction endonuclease, 

wherein the at least two nucleic acid bases are selected to be complementary to only one other 
overhanging terminus on another fragment of the ligation; and 

an affinity handle on the end of the primer. 



30 21. The primer of claim 20 further comprising an anchor to provide stability to the 
restriction enzyme at the restriction enzyme recognition site. 
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22. A method for isolating and identifying promoters comprising the steps of: 

a) obtaining a vector comprising at least a portion of a promoter region from a 
retrovirus transposon LTR and having two non-complementary overhanging termini; 

b) designing at least two PCR primers to amplify at least one region of a 
S retro-transposon LTR from template nucleic acid to produce at least one nucleic acid 

fragment wherem each pruner comprises at least one predetermined restriction endonuclease 
recognition site that recognizes a restriction endonuclease that cleaves at a distance from the 
recognition site, a sequence complementary to a template sequence from a retrovirus 
transposon, and bases positioned at the restriction endonuclease cleavage site that are selected 
10 to be complementary to only one other overhanging terminus of the vector wherein the 
restriction endonuclease cleavage site is created from enigmatic cleavage of the fragments; 

c) combining the primers with template nucleic acid and performing a gene 

amplification reaction to produce multiple copies of an amplified template fragment 

incorporating the restriction endonuclease recognition site; 
IS d) digesting the amplified template fragments with one or more restriction 

endonuclease that recognize the restriction endonuclease recognition site of the primer 

to create overhanging termini; and 

e) combining the amplified and digested template fragment in a ligation reaction 
vdth the vector to produce a gene vector with an intact LTR sequence. 

20 

23. The method of claim 22 wherein the template nucleic acid is DNA or RNA. 

24. The method of claim 22 further comprising the step of sequencing the insert to 
identify the promoter sequence. 

25 

25. Promoter sequences of SEQ ID NOS:2-13 identified using the methods of claim 22. 

26. The vector of SEQ ID NO: 1 . 



30 
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Genomic DNA 
or cellular SNA 



4 A. 



Amplincation of allelic parts via PCR or RT-FCR 



Combine the parts in deflned order using self-assembling genes 

Grow constructs en masse 

i 

' Transfect cells with constracts -f replication competent retrovirus 

Reisolate vectors after several passages 



4* ^ 
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